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Foreword 



Linearized Boltxmann transport equation is extensively used in nuclear engineering to 
assess the transport behaviour of radiation in bulk materials. The solution to this equation 
provides detailed knowledge about the spatial and temporal distribution of particle flue 
(of neutrons, photons, etc. All engineering quantities of practical interest such as heat 
produced, reaction rates, effective multiplication factors, radiation dose etc., are derived 
therefrom. This equation is not solvable in closed form except in the simplest of the 
situations. Therefore, numerical methods of solution are used in almost all applications. 
Many standardized computer codes have been developed, and validated against data from 
benchmark experiments. 

Numerical techniques of solving the equation also turn out to be inadequate if the 
geometry of the problem is complex, as it happens very often in real life situations. In 
these instances, Monte Carlo simulation of the physical processes contained in the equation 
is possibly the only way out. This technique, started around late 1940's, has developed 
considerably over the years and has now reached a high level of sophistication. As a 
general technique, it finds application in almost all branches of science as well. 

There is a tendency among the many practitioners of the Monte Carlo technique to 
use the method (particularly the readily available computer codes) like a black box, little 
realising that care and caution are called for in the choice of random mmibcr generators, 
in ensuring sampling adequacy, or in interpreting the statistical results obtained. There 
are excellent books dealing with the underlying theory of Monte Carlo games; studying 
the books requires some depth in mathematics, scaring away the beginner. This mono- 
graph is intended to present the basics of the Monte Carlo method in simple terms. The 
mathematical equipment required is not more than college level calculus and notational 
familiarity with primitive set theory. All the essential ideas required for an appreciation 
of the Monte Carlo technique - its merits, pitfalls, limitations - are presented in lucid 
pedagogic style. 

Dr. K. P. N. Murthy, the author of this monograph, has well over twenty five years 
of research experience in the application of Monte Carlo technique to problem in physical 
sciences, he has also taught this subject on a number of occassions and has been a source of 
inspiration to young and budding scientists. The conciseness and clarity of the monograph 
give an indication of his mastery overy the subject. 

It is my great pleasure, as President of the Indian Society for Radiation Physics, 
Kalpakkam Chapter, to be associated with the publication of this monograph by the 
society. Like all other popular brochures and technical documents brought out earlier, I 
am sure, this monograph will be well received. 

A. Natarajan 

President, Indian Society for Radiation Physics 

(Kalpakkam Chapter) 
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Preface 



Monte Carlo is a powerful numerical technique useful for solving several complex problems. 
The method has gained in importance and popularity owing to the easy availability of 
high-speed computers. 

In this monograph I shall make an attempt to present the theoretical basis for the 
Monte Carlo method in very simple terms: sample space, events, probability of events, 
random variables, mean, variance, covariance, characteristic function, moments, cumu- 
lants, Chebyshev inequality, law of large numbers, central limit theorem, generalization of 
the central limit theorem through Levy stable law, random numbers, generation of pseudo 
random numbers, randomness tests, random sampling techniques: inversion, rejection and 
Metropolis rejection; sampling from a Gaussian, analogue Monte Carlo, variance reduc- 
tion techniques with reference to importance sampling, and optimization of importance 
sampling. I have included twenty-one assignments, which are given as boxed items at 
appropriate places in the text. 

While learning Monte Carlo, I benefited from discussions with many of my colleagues. 
Some are P. S. Nagarajan, M. A. Prasad, S. R. Dwivedi, P. K. Sarkar, C. R. Gopalakrish- 
nan, M. C. Valsakumar, T. M. John, R. Indira, and V. Sridhar. I thank all of them and 
several others not mentioned here. 

I thank V. Sridhar for his exceedingly skillful and enthusiastic support to this project, 
in terms of critical reading of the manuscript, checking explicitly the derivations, correct- 
ing the manuscript in several places to improve its readability and for several hours of 
discussions. 

I thank M. C. Valsakumar for sharing his time and wisdom; for his imagination, sen- 
sitivity and robust intellect which have helped me in my research in general and in this 
venture in particular. Indeed M. C. Valsakumar has been and shall always remain a 
constant source of inspiration to all my endeavours. 

I thank R. Indira for several hours of discussions and for a critical reading of the 
manuscript. 

The first draft of this monograph was prepared on the basis of the talks I gave at the 
Workshop on Criticality calculations using KENO, held at Kalpakkam during September 
7-18, 1998. I thank A. Natarajan and C. R. Gopalakrishnan for the invitation. 
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Subsequently, I spent a month from October 5, 1998, at the School of Physics, Univer- 
sity of Hyderabad, as a UGC visiting fellow. During this period I gave a course on 'Monte 
Carlo: Theory and Practice'. This course was given in two parts. In the first part, I cov- 
ered the basics and in the second part I discussed several applications. This monograph 
is essentially based on the first part of the Hyderabad course. I thank A. K. Bhatnagar, 
for the invitation. I thank V. S. S. Sastri, K. Venu and their students for the hospitality. 

This monograph in the present form was prepared after several additions and extensive 
revisions I made during my stay as a guest scientist at the Institiit fiir Fcstkorpcrforschung, 
Forschungszentrum Jiilich, for three months starting from 22 July 1999. I thank Klaus 
W. Kehr for the invitation. I thank Forschungszentrum Jiilich for the hospitality. 

I thank Klaus W. Kehr, Michael Krenzlin, Kiaresch Mussawisade, Ralf Sambeth, Karl- 
Heinz Herrmann, D. Basu, M. Vijayalakshmi and several others, for the wonderful time I 
had in Jiilich. 

I thank Awadesh Mani, Michael Krenzlin, Ralf Sambeth, Achille Giacometti, S. Ra- 
jasekar Subodh R. Shenoy and S. Kanmani for a critical reading of the manuscript and 
for suggesting several changes to improve its readability. 

I owe a special word of thanks to A. Natarajan; he was instrumental in my taking 
up this project. He not only encouraged me into writing this monograph but also under- 
took the responsibility of getting it published through the Indian Society for Radiation 
Physics (ISRP), Kalpakkam Chapter, on the occassion of the Workshop on Monte Carlo: 
Radiation Transport, February 7 - 18, 2000, at Kalpakkam, conducted by the Safety Re- 
search Institute of Atomic Energy Regulatory Board (AERB). I am very pleased that A. 
Natarajan has written the foreword to this monograph. 

I have great pleasure in dedicating this monograph to two of the wonderful scientists 
I have met and whom I hold in very high esteem: Prof. Dr. Klaus W. Kehr, Jiilich, 
who retired formally in July 1999, and Dr. M. A. Prasad, Mumbai, who is retiring 
formally in March 2000. I take this opportunity to wish them both the very best. 

Kalpakkam, 

January 9, 2000. K. P. N. Murthy 
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1 INTRODUCTION 



Monte Carlo is a powerful numerical technique that makes use of random numbers to solve 
a problem. I assume we all know what random numbers are. However this issue is, by no 
means, trivial and I shall have something to say on this later. 

Historically, the first large scale Monte Carlo work carried out dates back to the mid- 
dle of the twentieth century. This work pertained to studies of neutron multiplication, 
scattering, propagation and eventual absorption in a medium or leakage from it. Ulam, 
von Neumann and Fermi were the first to propose and employ the Monte Carlo method 
as a viable numerical technique for solving practical problems. 

There were of course several isolated and perhaps not fully developed instances earlier, 
when Monte Carlo has been used in some form or the other. An example is the experi- 
ment performed in the middle of the nineteenth century, consisting of throwing a needle 
randomly on a board notched with parallel lines, and inferring the value of vr from the 
number of times the needle intersects a line; this is known as Buffon's needle problem, see 
for example 

The Quincunx constructed by Galton toward the end of the nineteenth century, 
consisted of balls rolling down an array of pins (which deflect the balls randomly to their 
left or right) and getting collected in the vertical compartments placed at the bottom. 
The heights of the balls in the compartments approximate the binomial distribution. This 
Monte Carlo experiment is a simple demonstration of the Central Limit Theorem. 

In the nineteen twenties, Karl Pearson perceived the use of random numbers for solving 
complex problems in probability theory and statistics that were not amenable to exact 
solutions. Pearson encouraged L. H. C. Tippet to produce a table of random numbers to 
help in such studies, and a book of random sampling numbers 1^] was published in the 
year 1927. This was followed by another publication of random numbers by R. A. Fisher 
and F. Yates. Pearson and his students used this method to obtain the distributions of 
several complex statistics. 

In India, P. C. Mahalanbois Q exploited ' random sampling ' technique to solve a 
variety problems like the choice of optimum sampling plans in survey work, choice of 
optimum size and shape of plots in experimental work etc. , see [^] . 

Indeed, descriptions of several modern Monte Carlo techniques appear in a paper 
by Kelvin Q, written nearly hundred years ago, in the context of a discussion on the 
Boltzmann equation. But Kelvin was more interested in the results than in the technique, 
which to him was obviousl 

Monte Carlo technique derives its name from a game very popular in Monaco. The 
children get together at the beach and throw pebbles at random on a square which has 
a circle inscribed in it. From the fraction of the pebbles that fall inside the circle, one 
can estimate the value of vr. This way of estimating the value of vr goes under the name 
rejection technique. A delightful variant of the game is discussed by Krauth Q; this variant 
brings out in a simple way the essential principle behind the Metropolis rejection method. 
We shall discuss the rejection technique as well as the Metropolis rejection technique later. 

In this monograph I shall define certain minimal statistical terms and invoke some 
important results in mathematical statistics to lay a foundation for the Monte Carlo 
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methods. I shall try to make the presentation as simple and as complete as possible. 

2 SAMPLE SPACE, EVENTS AND PROBABILITIES 

Consider an experiment, real or imagined, which leads to more than one outcome. We 
collect all the possible outcomes of the experiment in a set and call it the sample space. 
An outcome is often denoted by the symbol to. Certain subsets of are called events. 
The class of all events is denoted by J^. For every pair of events Ai £ and A2 G J^, if 
.4i U Ai n A2, and ^1 — A2 are also events G T, then is called a field. To each 
event A we assign a real number < V{A) < 1 called the probability of the event. If 0, 
consists of infinitely many outcomes, we demand that U^^Ai and D^^Ai be also events 
and form a Borel field of all possible events which includes the null event (p. We have 
r{(j)) = 0, V{n) = 1 andP(^iU^2) = P(^i) + ^(^2) - n ^2)- Events ^1 and ^2 
are disjoint (or mutually exclusive) if ^1 n ^2 = 

Not all the subsets of 0, shall be events. One reason for this is that we may wish 
to assign probabilities to only some of the subsets. Another reason is of mathematical 
nature: we may not be able to assign probabilities to some of the subsets of at all. The 
distinction between subsets of O and events and the consequent concept of Borel field are 
stated here simply for completeness. In applications, all reasonably defined subsets of Q 
would be events. 

How do we attach probabilities to the events ? 

Following the classical definition, we say that V{A) is the ratio of the number of outcomes 
in the event A to that in $7, provided all the outcomes are equally likely. To think of 
it, this is the method we adopt intuitively to assign probabilities - like when we say the 
probability for heads in a toss of a coin is half; the probability for the number, say one, 
to show up in a roll of a die is one-sixth; or when we say that all micro states are equally 
probable while formulating statistical mechanics. An interesting problem in this context 
is the Bertrand paradox Q; you are asked to find the probability for a randomly chosen 
chord in a circle of radius r to have a length exceeding r-v/3. You can get three answers 
1/2, 1/3, and 1/4, depending upon the experiment you design to select a chord randomly. 
An excellent discussion of the Bertrand paradox can be found in p. 
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Assignment 1 

(A) A fine needle of length 2a is dropped at random on a board covered 
with parallel lines with distance 26 apart. Show that the probability that 
the needle intersects one of the lines equals laj-nh. See page 131 of 

(B) Devise an experiment to select randomly a chord in a circle of radius 
r. What is the probability that its length exceeds r\/3 ? See page 9-10 of 

. What is the probability distribution of the chord length ? 



Alternately, we can take an operational approach. Vi^A) is obtained by observing the 
frequency of occurrence of A: repeat the experiment some times and let Nj^ be the 
number of times the event A occurs. Then Nj^jN in the limit of — > oo gives the 
probability of the event. 

As seen above, the formal study of probability theory requires three distinct notions 
namely the sample space il, the (Borel) field T and the probability measure V . See Pa- 
poulis Q and Feller |]l^. The physicists however, use a different but single notion, namely 
the ensemble. Consider for example a sample space that contains discrete outcomes. An 
ensemble is a collection whose members are the elements of the sample space but repeated 
as many times as would reflect their probabilities. Every member of O finds a place in the 
ensemble and every member of the ensemble is some element of the sample space. The 
number of times a given element (outcome) of the sample space occurs in the ensemble 
is such that the ratio of this number to the total number of members in the ensemble 
is, exactly, the probability associated with the outcome. The number of elements in the 
ensemble is strictly infinity. 

The molecules of a gas in equilibrium is a simple example of an ensemble; the speeds 
of the molecules at any instant of time have Maxwell-Boltzmann distribution, denoted by 
"piy). Each molecule can then be thought of as a member of the ensemble; the number of 
molecules with speeds between v\ and V2, divided by the total number of molecules N in 
the gas is J^^p{v)dv. 

3 RANDOM VARIABLES 

The next important concept in the theory of probability is the definition of a (real) ran- 
dom variable. A random variable, denoted by X{uj), is a set function: it attaches a real 
number x to an outcome to. A random variable thus maps the abstract outcomes to the 
numbers on the real line. It stamps each outcome, so to say, with a real number. 

Consider an experiment whose outcomes are discrete and are denoted by {uJi}, with 
i running from 1 to say A^. Let X{uj) be the random variable that maps the abstract 
outcomes {wj} to real numbers {xi}. Then pi = 'P{uj\X{lj) = Xi) gives the probability 
that the random variable X{uj) takes a real value Xj. The probabilities {pi} obey the 
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conditions, 



< Pi < 1, V 

Eft = 1- (1) 

i=l 

Let me illustrate the above ideas through a few examples. 
Tossing of a single coin 

The simplest example is the tossing of a single coin. The sample space is O = {H, T}, 
where H denotes the outcome Heads and T the Tails. There are four possible events : 

J' = {{H},{T},{H,T},{cl>}}. 

The corresponding probabilities are 1/2, 1/2, 1 and for a fair coin. 

We can define a random variable X{oj) by attaching +1 to H and — 1 to T, see Table 
(1). Then we say that the probability for the random variable X to take a value +1 
is V[lu\X(uj) = 1] = 1/2 and that for —1 is half. This defines the discrete probability 
distribution: p(l) = 1/2, p(-l) = 1/2, see Table (1). 

Table 1: Random variable and probability for the toss of a single coin. 



UJ 


X = X{uj) 


V [uj\X{uj) = x] 


H 


+1 


1/2 


T 


-1 


1/2 



The physicists' ensemble containing J\f members would be such that half of M are 
Heads and half Tails. If the probability of Heads is 3/4 and of Tails is 1/4, then the 
corresponding ensemble would contain 3Af/4 heads and Af/4 Tails. 

Tossing of several coins 

Consider now the tossing of N independent fair coins. Let Xi, X2, • • • Xn be the corre- 
sponding random variables. Xi(H) = +1 and Xi(T) = — 1 V i = 1, 2, • • • , A^. For each 
coin there are two possible (equally likely) outcomes. Therefore, for N coins there are 2-^ 
possible (equally likely) outcomes. For the A^-coin-tossing experiment each outcome is a 
distinct string of H and T, the string length being N. 
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Table 2: Random variable and Probability for the three - coin -tossing experiment 







V [oJ^io) = x] 


HHH 


+3 


1/8 


HHT 
JTL ± n 
THH 




o 1 o 


TTH 
THT 
HTT 


-1 


3/8 


TTT 


-3 


1/8 



An example with = 3 is shown in Table (2), where the random variable 13(^1-') is 
defined by the sum: = Xi + X2 + X^. 

The probability for each outcome (string) in the throw of coins is thus 2"^. Let 
us define a random variable = Xi + X2 + • • • + X^. The probability for n heads in a 
toss of coins, which is the same as the probability for the random variable Yat to take 
a value n — [N — n) is given by the binomial distribution, 

P(n,Ar).P[y, = 2n-Ar] = ^-^^^. (2) 

Fig. [H depicts the binomial distribution for A^ = 6. 

For the general case of a loaded coin, with probability for Heads as p and for the Tails 
as g = 1 — p, we have, 

N) ^V[YM = 2n-N]= , g^"". (3) 

n\ (A — n)! 
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Figure 1: Binomial distribution, see Eq. (^) with = 6 



Rolling of a fair die 

Another simple example consists of rolling a die. There are six possible outcomes. The 
sample space is given by, 

n = 



The random variable X{u)) attaches the numbers {xi : z = 1, 2, • • • , 6} = {1, 2, 3, 4, 5, 6} 
to the outcomes {uji : i = 1, 2, • • • , 6}, in the same order. If the die is not loaded we have 
p{xi) = 1/6 V z = 1, 2, • • • , 6. An example of an event, which is the subset of fi, is 



A 

and V{A) = 1/2. This event corresponds to the roll of an odd number. Consider the event 

B = 



which corresponds to the roll of an even number. It is clear that the events A and B can 
not happen simultaneously; in other words a single roll of the die can not lead to both the 
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events A and B. The events A and B are disjoint (or mutually exclusive. If we define 
another event 



then it is clear that A and C are not disjoint. We have 

AnC = I • 

Similarly the events B and C are not disjoint 

Bnc = 



Assignment 2 

Consider rolling of a fair die N times. Let denote the number of times 
the number k shows up and k runs from 1 to 6. Find an expression for 
the probability P{ni,n2,n3,n4,n5,nQ, N). What is the probability distri- 
bution of the random variable n = X]fc=i ^ '^fc'^ 



Poisson distribution 

An important discrete distribution, called the Poisson distribution, arises in the context 
of random occurrences in time. In an interval At — > positioned at any time t, there is 
either one occurrence with probability XAt, or no occurrence with probability 1 — XAt. 
Here is the characteristic time constant of the Poisson process. 

Let P{n,t) denote the probability that there are n occurrences in the interval [0,t]. A 
master equation can be readily written down as, 

P{n,t) = XAt P{n-l,t- At) + [1- \At]P{n,t- At), 
P{n,t = 0) = 5n,o. (4) 

To solve for P(n,t), define the generating function, 

oo 

P(z,t) = ^z"P(n,t). (5) 

n=0 

Multiplying both sides of Eq. (^) by and summing over n from to oo, we get, 

P{z, t) = \zAtP{z, t- At) + [l- \At]P{z, t - At), 
P{z,t = 0) = 1. (6) 
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In the limit At — > 0, we get from the above 



whose solution is, 



since the initial condition, 



dP 

-^ = -X{l-z)P{z,t), 



P{z,t) = exp [-A(l - z)t] 



P{z,t = 0) = ^z"P(n,t = 0) 

n=0 
oo 



n=0 
1. 



Taylor expanding the right hand side of Eq. (^), we get P{n,t) as the coefficient of 

P{n,t) = i^exp(-At). 



nl 



Fig. m depicts the Poisson distribution. 
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Figure 2: Poisson distribution, sec Eq. (10) with Xt = 2. 
Continuous distributions 



What we have seen above are a few examples of discrete random variable that can take 
either a finite number or at most countable number of possible values. Often we have to 
consider random variables that can take values in an interval. The sample space is a con- 
tinuum. Then we say that f{x)dx is the probability of the event for which the random 
variable X(uj) takes a value between x and x + dx. i.e. f{x)dx = V[uj\x < X{uj) < x + dx]. 
We call /(x) the probability density function or the probability distribution func- 
tion. 

Uniform distribution 

An example is the uniform random variable, U{a,b), defined in the interval [a,b]. The 
probability density function, f(x) of the random variable U{a,b) is defined by f{x)dx = 
dx/{h — a). We shall come across the random variable f/(0, 1) later in the context of pseudo 
random numbers and their generators. 



0.8 




-5 -4 -3 -2 -1 1 2 3 4 5 

X 



Figure 3: Gaussian distribution, see Eq. ( |Tl| ) with = 0, and a = .5 (--•), 1-0 (-) and 2.0 (- -) 
Gaussian distribution 

A probability density function we would often come across is the Gaussian. This density 
function is of fundamental importance in several physical and mathematical applications. 
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It is given by, 



G{x) 



1 



crV27r 



exp 



2C72 



-oo < X < +00, 



(11) 



where /x and a are the parameters, called the mean and standard deviation. Fig. |3| depicts 
the Gaussian density for /i = and a = 0.5, 1.0 and 1.5. The Gaussian density function 
plays a central role in the estimation of statistical errors in Monte Carlo simulation, as we 
would see later. 



Exponential distribution: 

The exponential density arises in the context of several physical phenomena. The time 
taken by a radioactive nucleus to decay is exponentially distributed. The distance a 
gamma ray photon travels in a medium before absorption or scattering is exponentially 
distributed. The exponential distribution is given by. 



fix) 



ae 



0, 



for X > 



for a; < 



where a > is a parameter of the exponential distribution. Fig. 
tial distribution, with a = 1. 



(12) 

§ depicts the exponen- 
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Cauchy distribution 

An interesting distribution named after Caucliy is given by, 

•^(^) = - n2^ 2 ' <x < +00, (13) 

wliere > is a scale factor. Fig. ^ depicts the Cauchy distribution with D = 1. 

0.35 1 \ I I I I \ \ \ \ 




X 

Figure 5: Cauchy distribution, see Eq. ( p^ with D = 1. 

If a random variable 6 is uniformly distributed in the range — 7r/2 to +tt/2, then 
X = tan(0) follows a Cauchy distribution (D = 1). If Xi and X2 are independent Caussian 
random variables each with mean zero and standard deviation ai and (T2 respectively, then 
the ratio X = X1/X2 is Cauchy distributed with D = 02/ (Ji- The Cauchy distribution is 
also known in the literature as Lorentz distribution or Breit-Wigner distribution and 
arises in the context of resonance line shapes. 

4 MEAN, VARIANCE AND COVARIANCE 

Mean 

The mean (also called the expectation value, the average, or the first moment) of a random 
variable X{ijj), with a probability density /(x) is denoted by the symbol /x and is defined 
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as, 

/ + 00 
X f{x)dx. (14) 
-oo 

Mean is the most important single parameter in the theory of probabihty; indeed in Monte 
Carlo calculations we would be interested in calculating the mean of some random variable, 
defined explicitly or implicitly. Let us repeat the experiment times and observe that the 
outcomes are loi,u!2, ■ ■ ■ ,con- Let the value of the random variable X{uj) corresponding to 
these outcomes be xi,X2, ■ ■ ■ respectively. If N is sufficiently large then Q2k=i ^k)/N 
approximately equals ^. In the limit N ^ oo the arithmetic mean of the results of N 
experiments converges to 

Statistical convergence 

At this point it is worthwhile discussing the meaning of statistical convergence vis-a- 
vis deterministic convergence that we are familiar with. A given sequence {A{i'); v = 
1, 2, ■ • •} is said to converge to a value, say i3 as ^ oo, if for (an arbitrarily small) 
6 > 0, we can find an M such that for all k > M, A{k) is guaranteed to be within 5 oi B 
. In the statistical context the term guarantee is replaced by a statement of probability, 
and the corresponding definition of convergence becomes: A{i') is said to converge to B 
as ^ cx), if given probability < P < 1 and (an arbitrarily small) 5 > 0, we can find 
an M such that for all k > M, the probability that A{k) is within 6 of B is greater than 
P. This risk that convergence can only be assured with a certain probability is an inherent 
feature of all Monte Carlo calculations. We shall have more to say on this issue when we 
take up discussions on the Chebyshev inequality, the law of large numbers and eventually 
the central limit theorem which help us appreciate the nature and content of Monte Carlo 
errors. For the moment it suffices to say that it is possible to make a quantitative prob- 
abilistic statement about how close is the arithmetic mean, A{i') of v experiments to the 
actual mean fi, for large u; such a statistical error estimate would depend on the number 
of experiments (z^) and the value of the second most important parameter in the theory 
of probability namely, the variance of the random variable, X, underlying the experiment. 

Variance 

Variance, cr^ is defined as the expectation value of {X — /x)^. Formally we have, 

/+0O 
(x - fiff{x)dx. (15) 
-oo 

The square root of the variance is called the standard deviation. 
Moments 

We can define, what are called the moments of the random variable. The K**^ moment 
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is denoted by Mk, and is defined as, 

/+00 
x^f{x)dx. (16) 
-oo 

It is clear that Mq = 1, which imphes that f{x) is normahzed to unity, Mi = /i, and 
0-2 = M2 - Ml. 

Cumulative probability distribution 

The cumulative probability density function, denoted by F{x) is defined as 

F{x) = V [uj\X{uj) <x\= r f{x')dx. (17) 

^—00 

F{x) is a monotonic non-decreasing function of x; F{—oo) = and F{oo) = 1. 
Sum of two random variables 

Let us consider two random variables Xi and X2. Let I2 = Xi + X2. We have, 

/+00 /■+00 
dxi dx2{xi + X2)f{xi,X2), (18) 

-00 J — 00 

where /(xi, X2) denotes the joint density of the random variables Xi and X2. Let fi{x) 
and f2{x) denote the probability density functions of Xi and X2 respectively. These are 
called marginal densities and are obtained from the joint density as follows. 

/+00 
dX2f{xi,X2). 
-00 

/ + 00 
dxif{xi,X2). (19) 
-00 



The integral in Eq. (18) can be evaluated and we get, 

f+oo r+oo 



/+00 n+00 
dxi Xl I dX2 f{xi,X2) + 
"OO J ~oo 



00 J ~oo 

+00 r+oo 

dxi / dx2 X2 f{xi,X2) 



00 



+00 r+oo 

dxi Xl fl{xi) + / dx2 X2f2{x2) 
-00 J —oo 

= fi{Xi) + fi{X2). (20) 

The means thus add up. The variance however does not, since it involves the square of 
the random variable. We have, 

(tHY2) = a\Xi)+a^iX2) + 2x cov[Xi,X2]. (21) 
13 



The last term in the above is the covariance of Xi and X2, and is given by 



/+00 n+00 
dxi I dx2 [xi - /ii] [x2 - At2] /(2;i,X2), (22) 

where /ii = niXi) and //2 = ^(-^^2)- One can define the conditional density of Xi given 
that X2 takes a value, say X2, as 

/c(xiM = -^^. (23) 

If fc{xi\x2) = fi{xi), then we find that f{xi,X2) = fi{xi) x f2{x2)- The random vari- 
ables Xi and X2 are then independent. In that case we find from Eq. (^) that 
cov(Xi, X2) = 0. Thus the covariance is zero if the two random variables are independent. 
If the covariance is positive then we say that the two random variables are positively cor- 
related; if negative, the random variables are negatively correlated. Note that two random 
variables may he uncorrelated i.e. covariance is zero) hut they need not he independent; 
however if they are independent, they must he uncorrelated. One usually considers the 
dimension - less normalized quantity called the correlation coefficient defined as, 

a[Xi)a[X2) 
It is easily verified that -1 < C(Xi, X2) < +1. 

Let us calculate the mean and variance of the distributions we have seen so far: 
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Binomial distribution 

For the toss of single fair coin, 

1^ = ^x(+l) + ^x(-l) = 0, 

a' = i X (+1)2 + ix (-1)2 = 1. (25) 
For the toss of A'" independent fair coins, the random variable Yat = {Xi+X2-\ h-^jv) 

has 

n=0 ^ ' 

and 

1 ^ ATI 
n=0 ^ ' 

The random variable Yat has a simple physical interpretation in terms of a random 
walk on a one-dimensional lattice. The lattice sites are on, say x axis, at unit intervals. 
You start at the origin. Toss a fair coin. If it shows up Heads then step to the right and 
go to the lattice site x = 1; if toss is Tails, then step to the left and go to the lattice site 
X = —\. At the new site, toss the coin again to decide whether to go the left site or the 
right site. After A'' tosses find out where you are on the axis, and your position defines the 
random variable Y^r. We can replace the number of coins or the tosses by time t and say 
x{t) is the position of the random walker after time t, given it started at origin at time 
zero. Average of x(f) is zero for all times. You do not move on the average. The variance 
of x{t) denoted by cP'ii) increases linearly with time and the proportionality constant is 
often denoted by ID where D is called the diffusion constant. Thus, for the random walk 
generated by the tossing of fair coins, the diffusion constant is one-half. 

On the other hand, if we consider the random variable Y/v = Y^r/iV, we find its mean is 
zero and standard deviation is 1/ \/]V. Notice that the standard deviation of Y/v becomes 
smaller and smaller as N increases. 

Consider now the random variable Y/^ = Y^r / ; it is easily verified that its mean is 
zero and variance is unity, the same as that for the single toss. Thus \fN seems to provide 
a natural scale for the sum of A'^ independent and identically distributed random variables 
with finite variance. The reason for this would become clear in the sequel, see section 8. 

For the case of a single loaded coin (with -p as the probability for Heads and g = 1 — p 
as that for the Tails) we have, 

^ = px (+l) + gx (-1) =p-g = 2p-l, (28) 

and 

(7^ = p X (+1)2 + g X (-1)2 -{p- qf = Ap{l - p). (29) 
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For the toss of N independent and identically loaded coins, the random variable Ijv = 
Xi + X2 + ---Xn has, 



^ ATI 
n=0 ^ ' 

and 

^ /VI 

n=0 ' 

We notice that both the mean and the variance increase linearly with N. The relevant 
quantity is the standard deviation relative to the mean; this quantity decreases as 1/ \/iV- 
The tossing of TV" loaded coins defines a biased random walk: there is a drift with velocity 
2p — 1 and a diffusion with a diffusion constant D = a'^ /2N = 2p{l — p). 

We can also consider the random variable Yat = Yn/N. Its mean is — 1 and is 
independent of iV. Its standard deviation, however, decreases with N as I/s/N. Consider 
now the natural scaling: = Y^/s/N; we find its mean is (2p — 1)-\/N and variance 

4p(l —p). The variance is independent of N. 

Poisson distribution 

The mean of the Poisson distribution is given by 

m(0 = E = (32) 

n=0 

and the variance is given by, 

a\t) = f^{n- Xtf^-^e-^' = Xt. (33) 

Thus, for the Poisson distribution the mean and the variance have the same magnitude. 

Exponential distribution 

The mean of the exponential density is given by 

i-oo I 

fi = xaexp{—ax) dx = — , (34) 
Jo OL 

and the variance is, 

= [ (x ^ aexp(—ax) dx = ( —\ ■ (35) 

Jo \ aj \aj 

The standard deviation of the exponential density is a = 1/a. We can associate the 
standard deviation with something like the expected deviation from the mean of a number. 
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sampled randomly from a distribution. We shall take up the issue of random sampling 
from a given distribution later. For the present, let us assume that we have a large set 
of numbers sampled independently and randomly from the exponential distribution. The 
expected fraction of the numbers that fall between /x — cr = and /u + o" = 2/a can be 
calculated and is given by 

Vifi -a<x<fi + a) = / ae dx 

J fi—a 

= 0.8647. (36) 

Thus, nearly 86% of the sampled numbers are expected to lie within one sigma deviation 
from the mean. Of course there shall also be numbers much larger than the mean since 
the range of x extends up to infinity. 

The distribution of the sum of independent exponential random variables and the 
same scaled by N and \/iV will be considered separately later. In fact I am going to 
use this example to illustrate the approach to Gaussian as dictated by the Central Limit 
Theorem. 



Cauchy distribution 



Let us consider the Cauchy distribution discussed earlier. We shall try to evaluate the 
mean by carrying out the following integration, 

1 f+°° D 

The above integral strictly does not exist. The reason is as follows. If we carry out the 
integration from — oo to and then from to oo, these two integrals do not exist. Notice 
that the integrand is a odd function. Hence if we allow something like a principal value 
integration, where the limits are taken simultaneously, we can see that the integral from 
— oo to cancels out that from to +oo, and we can say the mean, /i is zero, consistent 
with the graphical interpretation of the density, see Fig. ^. If we now try to evaluate the 
variance, we find. 

The above is an unbounded integral. Hence if we sample numbers randomly and indepen- 
dently from the Cauchy distribution and make an attempt to predict the extent to which 
these numbers fall close to the 'mean' , we would fail. Nevertheless Cauchy distribution 
is a legitimate probability distribution since its integral from — oo to +oo is unity and for 
all values of x, the distribution function is greater than or equal to zero. But its variance 
is infinity and its mean calls for a 'generous' interpretation of the integration. Since the 
standard deviation for the Cauchy distribution is infinity, the width of the distribution is 
usually characterized by the Full Width at Half Maximum(FWHM) and it is 2D. 

Consider the sum Yat = Xi + X2 + ■ ■ ■ + Xn of independent Cauchy random vari- 
ables. What is the probability distribution of the random variable Yat ? Does it tend to 
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Gaussian as A'^ — > oo ? If not, why ? What is the natural scahng for the sum Yyv ? These 
and related questions we shall take up later. Bear in mind that the Cauchy distribution 
has unbounded variance and this property will set it apart from the others which have 
finite variance. 



Gaussian distribution 



For the Gaussian distribution the two parameters fi and a are the mean and standard 
deviation respectively. Let us calculate the probability that a number sampled from a 
Gaussian falls within one sigma interval around the mean. This can be obtained by 
integrating the Gaussian from /x — cr to /x + cr. We get, 

1 /■M+o' 

V{lJ, — u<x<ii + a) = — -= I exp 

(TV 27r i/i-(T 

= 0.6826. (39) 

Thus 68% of the numbers sampled independently and randomly from a Gaussian distri- 
bution arc expected to fall within one sigma interval around the mean. This interval is 
usually called the one-sigma confidence interval. 

The sum Yn of N independent Gaussian random variables is also a Gaussian with 
mean N/i and variance Na'^. When you scale the sum by N, the mean is fi and the 
variance is cr^/N; on the other hand the random variable Yi^/\fN, has mean fi^/N and 
variance . These will become clear when we take up the discussion on the Central Limit 
theorem later. In fact we shall see that the arithmetic mean of N independent random 
variables, each with finite variance, would tend to have a Gaussian distribution for large 
N. The standard deviation of the limiting Gaussian distribution will be proportional to 
the inverse of the square root of A'^. We shall interpret the one-sigma confidence interval 
of the Gaussian as the statistical error associated with the estimated mean of the N inde- 
pendent random variables. 



2(72 



dx, 



Uniform distribution 



For the uniform random variable U{a,b), the mean is 

rb 



1 r , a+b 
and the variance is. 



= / \x — dx = (41) 

b-aJa \ 2 J 12 ^ ^ 

For the random variable f/(0, 1), the mean and variance are respectively 1/2 and 1/12. 
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Assignment 3 

Consider N independent uniform random variables U (0, 1). Let Ijv denote 
their sum. 

(A) Find the mean and variance of (a) Yn, (b) Y^/N and (c) Yn/^/N as 
a function N . 

(B) Derive an expression for the probabiUty distribution of Y]v for N = 2. 



5 CHARACTERISTIC FUNCTION 

The characteristic function of a random variable X is defined as the expectation value 
of exp(zA;X). 

^x{k) = / e^p{ikx)f{x) dx. 



Expanding the exponential we get, 

/+0O 
-oo 



,., , (ikxY (ikxY 
1 + {ikx) + + ■■■ + - — r- + • 



2! 



n! 



Assuming that term wise integration is valid, we find. 



{iky 



{iky 



^x{k) = 1 + ikMi + ^— + • • • + 

2! n\ 



Mr, 



from which we get, 



A;=0 



dx. 



(42) 



(43) 



(44) 
(45) 



Thus we can generate all the moments from the characteristic function. 

For a discrete random variable, the characteristic function is defined similarly as, 

^ik) = Y, exp{ikxn)Pn- (46) 



The logarithm of the characteristic function generates, what are called the cumulants or 
the semi-invariants. We have. 



oo / • i\n 

^^{k)=ln[^x{k)]^Y.^Cn, 



ni 



where C„ is the n-th cumulant, given by. 



k=0 



(47) 



(48) 
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How are the cumulants related to the moments ? 



This can be found by considering, 



hi 



1 + ikMi 



2! 



M2 + + ■ 



ikCi H — r^C2 H :t^L'3 + 



2! 



3! 



(49) 



Taylor-expanding the logarithm on the left hand side of the above and equating the coef- 
ficients of the same powers oi k, we get, 



Ci 
C2 
C3 
C4 



Ml 

M2 - = cr^ 

M3-3M2Mi-F2Mf, 

M4 - 4M3M1 - 3M| 12M2Mf - 6Mi^. 



(50) 
(51) 
(52) 
(53) 



The first cumulant is the same as the first moment (mean). The second cumulant is the 
variance. 

A general and simple expression relating the moments to the cumulants can be obtained 
[11 1 as follows. We have, 

d^{k) ^^,.d^{k) 



<^{k)- 



(54) 



dk dk 

where ^{k), see Eq. (44) and "^{k), see Eq. ( |47| ) are the moments and cumulants gen- 
erating functions respectively; we have dropped the suffix X for convenience. We see 
that, 



d"+^$(fc) 

from which it follows that, 

Ci = Ml 

Cn+l = Mn+1 



dk^ 



d'^ik) 
dk 



E 



n\ d"-'"^>(A;) d™+i^'(A:) 



(55) 



n-l 



E 



n! 



m! (n — m)! 



M„ 



Cm+i V n > 0. 



(56) 



Let us calculate the characteristic functions of the several random variables considered so 
far: 
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Binomial distribution 



The characteristic function of the random variable defined for the toss of a single fair coin 
is 

= cos{k). (57) 

The characteristic function of Yat = Xi+X2+- ■ ■ X^, defined for the toss of N independent 
fair coins is given by, 

1 ^ N\ 
'^Y^W 2^^,n!(7V-n)!'^ 



= [cos(A;)]^ 

= [$x(A;)]^. (58) 
Sum of independent random variables 

A general result is that the characteristic function of the sum of independent random 
variables is given by the product of the characteristic functions of the random variables. 
Let X\,Xii - ■ ■ X]s! be independent and not necessarily identically distributed. Let Yat = 
Xi + X2 + • • • + X]<j. Formally, the characteristic function of Y/v, denoted by $yjy(A;) is 
given by 

^Yt^{k) = j dxi j dx2 ■■■ 

■ ■ ■ j dxN exp [ik {xi+ X2-\ h xn)] fixi, X2, - ■ ■ , xn)- (59) 

Since Xi, X2, • • • , Xjv are independent, the joint density can be written as the product of 
the densities of the random variables, i.e., 

f{xi,X2,---XN) = fl{xi)f2{x2)--- fN{xN), (60) 

where fi{xi) is the probability density of Xi. Therefore, 

N 

^Yivik) = Yi dxi exjp (ikxi) fi{xi) 
1=1'' 

N 

= U'^x^k), (61) 



i=l 
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where ^Xiik) denotes the characteristic function of the random variable Xi. If these N 
random variables are also identical, then $y^(fc) = [^x{k)]^ . 

Arithmetic mean of independent and identically distributed random variables 
Let 

- Xi+X2-\ \-Xn , . 

Yn = , (62) 

define the arithmetic mean of N independent and identically distributed random variables. 
In Eq. ([59|), replace k by k/N; the resulting equation defines the characteristic function 
of Yn- Thus we have, 

^y.(^) = ^vAk/N) = [<^>x{k/N)f . (63) 
Sum of N independent and identically distributed random variables scaled by 



Later, we shall have occasions to consider the random variable Y y^, defined by. 



whose characteristic function can be obtained by replacing khy k/\/N m Eq. (|59D . Thus, 

/ — r / — 1^ 
^Y^{k) = ^YAk/^)= ^x{k/VN) . (65) 



Exponential distribution 

For the exponential distribution, see Eq. ([T2|), with mean unity (i.e. a = 1), the 
characteristic function is given by, 





The sum Yn of N exponential random variables has a characteristic function given by. 
The random variable Y^ has a characteristic function given by. 
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which has been obtained by replacing k by k/N in Eq. ([67|). 
Poisson distribution 



For the Poisson distribution, see Eq. (^^}, the characteristic function is given by, 



oc) ikri ( \ j.\n „— \t 

Hk,t) = T.- ^ ^ 

n=0 



exp 



(69) 



which is the same as Eq. (^) if we set z = exp(i/c). 
Gaussian distribution 

For the Gaussian distribution, the characteristic function is given by, 

^x{k) = exj){ink - ]^a'^k^). 
Following the rules described above, we have, 

^Ym (k) = exp 
^yjk) = exp 
$ 



iNnk - -Na'^k^ 



ifj,k 



2 N 



exp 



i^nk - ^a^k'^ 



(70) 

(71) 
(72) 
(73) 



For a Gaussian, only the first two cumulants are non-zero, identically zero. From Eq. (73) 
we see that the sum of N Gaussian random variables scaled by ^/N, is again a Gaussian 
with variance independent of N. 



Assignment 4 

Derive expressions for the characteristic functions of (a) the Gaussian and 
b) the Cauchy distributions. 



6 CHEBYSHEV INEQUALITY 

Let X be an arbitrary random variable with a probability density function f{x), and finite 
variance o"^. The Chebyshev inequality, see Papoulis 0], says that. 



V{\X > ka} < 



1 



(74) 
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for k > 1. Thus, regardless of the nature of the density function /(x), the probabihty that 
the random variable X takes a value between /U — e and /U + e, is greater than 1 — (cr^/e^), 
for € > a. The Chebyshev inequality follows directly from the definition of the variance, 
as shown below. 



/+00 
[x — dx 
-oo 



— oo 
u— fccr 

2 . 



> / {x-nYfix) dx+ {x-fiyf{x)dx 

fi—kcr r+ca 

f{x) dx + f{x) dx 



= k^a^V{\X - fx\>ka} . (75) 

The Chebyshev inequality is simple; it is easily adapted to sums of random variables and 
this precisely is of concern to us in the Monte Carlo simulation. Take for example, 
independent realizations of a random variable X with mean zero and variance a^. Let 
Yn be the arithmetic mean of these realizations. Yat is a random variable. The mean 
of Y)v is zero and its variance is a'^ /N. Chebyshev inequality can now be applied to the 
random variable Y^- Accordingly, a particular realization of the random variable Yat will 
lie outside the interval (— e,+e) with a probability less than or equal to a'^/{Ne^). Thus, 
as e becomes smaller, by choosing N adequately large we find that a realization of Y^ can 
be made to be as close to the mean as we desire with a probability very close to unity. 

This leads us naturally to the laws of large numbers. Of the several laws of large 
numbers, discovered over a period two hundred years, we shall see perhaps the earliest 
version, see Papoulis |^], which is, in a sense, already contained in the Chebyshev inequal- 
ity. 



7 LAW OF LARGE NUMBERS 

Consider the random variables Xi, X2, • • • , X^ which are independent and identically dis- 
tributed. The common probability density has a mean /i and a finite variance. Let Yat 
denote the sum of the random variables divided by A^. The law of large numbers says that 
for a given e > 0, as — > 00, 

V{\YN-^l\ >e} ^0 . 

It is easy to see that a realization of the random variable Yat is just the Monte Carlo 
estimate of the mean /i from a sample of size A^. The law of large numbers assures us that 
the sample mean converges to the population mean as the sample size increases. I 
must emphasize that the convergence we are talking about here is in a probabilistic sense. 
Also notice that the law of large numbers does not make any statement about the nature 
of the probability density of the random variable Yat. It simply assures us that in the 
limit of A^ — > 00, the sample mean would converge to the right answer. Yat is called the 
consistent estimator of /U. 
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The central limit theorem on the other hand, goes a step further and tells us about 
the nature of the probability density function of Yjy, as we shall see below. 



8 CENTRAL LIMIT THEOREM (CLT) 

Let Xi,X2, ■ ■ ■ Xn be N independent and identically distributed random variables having 
a common Gaussian probability density with mean zero and variance a'^, given by, 



Gx(x) 



1 



aV2TT 



exp 



X 

'2^ 



(76) 



Let Y]sr = Xi + X2 + • • ■ Xi\f. It is clear that the characteristic function of Y/v is the 
characteristic function of X raised to the power N. Thus, from Eq. (|70|), 



= exp 



N 



2 



(77) 



Fourier inverting the above, we find that the probability density of the random variable 
Yat is Gaussian given by. 



GYr,{x) 



1 



a^/2'KN 



exp 



X 



2iVo-2 



(78) 



with mean zero and variance Na"^. Thus when you add Gaussian random variables, the 
sum is again a Gaussian random variable. Under addition, Gaussian is stable. The scaling 
behaviour of the distribution is evident, 



Ari/2 



(79) 



The variance of the sum of independent and identically distributed Gaussian random 
variables increases (only) linearly with A^. On the other hand the variance of Y/v is c^/iV, 
and thus decreases with A^. Therefore the probability density of is, 



G 



N 



Y 



^ crV27f 



exp 



(80) 



The Central Limit Theorem asserts that even if the common probabiUty den- 
sity of the N random variables is not Gaussian, but some other arbitrary 
density with (zero mean and) finite variance, Eq. (|80|) is still valid but in the 

limit of — > 00. To see this, write the characteristic function of the common probability 
density as 



Akx 



f{x) dx 

1 

6 
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-Mik'^ 



(81) 
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Hence the characteristic function of is, 



^yjk) = [^x{k/N)f 
exp 



1 + i 



M^k 1 M4A;2 1 



2 N 



3o-2 AT 12cj2 Af2 
(for N ^ oo), 



AT 



(82) 



whose inverse is the density given by Eq. (^), see van Kampen |13]. 

The above can be expressed in terms of cumulant expansion. We have for the random 
variable X, 



^x{k) = exp[^'x(A;)] 
= exp 



(83) 



where Ci,C2,C3-- - are the cumulants of X. Then, the characteristic function of the 
random variable Yat is given by, 



^Yjk) = ['^x{k/N)] 
= exp 



N 



2 N 

We immediately see that for — > oo. 



zkCi — - I 1 + 



3 C2 iV I2C2 



(84) 



y-jv ~ exp 



2 iV 



(85) 



whose Fourier- inverse is a Gaussian with mean Ci, and variance C2/N. 

For the random variable y defined as the sum of N random variables divided by 
\/iV, the characteristic function can be written as. 



^x{k/VN) 



N 



exp 



exp 



ikVNC- 



k^C2 



(for N ^ 00) 



(86) 



which upon Fourier inversion gives a Gaussian with mean Ci\/ N and variance C2. The 
variance of is independent of N . 
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8.1 Levy Distributions 

In the beginning of the discussion on the Central Limit Theorem, I said when you add 
Gaussian random variables, what you get is again a Gaussian. Thus Gaussian is stable un- 
der addition. This is a remarkable property. Not all distributions have this property. For 
example, when you add two uniform random variables, you get a random variable with 
triangular distribution. When two exponential random variables are added we get one 
with a Gamma distribution, as we would see in the next section where we shall be investi- 
gating the behaviour of the sum of several independently distributed exponential random 
variables. (In fact we went a step further and found that when we add N identically 
distributed independent random variables with finite variance, the resulting distribution 
tends to a Gaussian, when — > oo. This is what we called the Central Limit Theorem). 

A natural question that arises in this context is: Are there any other distributions, 
that are stable under addition ? The answer is yes and they are called the Levy stable 
distributions, discovered by Paul Levy |jl2| in the mid twenties. Unfortunately the 
general form of the Levy distribution is not available. What is available is its characteristic 
function. Restricting ourselves to the symmetric case, the characteristic function of a Levy 
distribution is given by, 

L(A;;a) = exp(-L>|A;|'"), (87) 

where -D > is a scale factor, k is the transform variable corresponding to x, and a is the 
Levy index. The Fourier inverse of L(k;a) is the Levy distribution L{x;a). Levy showed 
that for L{x; a) to be non negative, < a < 2. In fact we have, 

r D-V" ^ for X = 0; 

L(x;a) ~ < D|x|-°-i , for |x| ^ oo; a < 2 ;. (88) 

I exp(-xV4£') , V X and a = 2 

The pre factor in the above can be fixed by normalization. We see that Gaussian is a 
special case of the Levy distributions when a = 2. In fact Gaussian is the only Levy 
distribution with finite variance. Earlier we saw about Cauchy distribution. This is again 
a special case of the Levy distribution obtained when we set a = 1. 

Consider independent and identically distributed Levy random variables, with the 
common distribution denoted by L{x; a). The sum Yj\f is again a Levy distribution denoted 
by Li\i{x; a), obtained from L{x; a) by replacing D by ND. Let us scale Yat by N^^" and 
consider the random variable Y^/N^^"- Its distribution is L{x;a). Thus A^-*^/" provides a 
natural scaling for Levy distributions. We have, 

L^(x; a) = N-'/'^Lx a) • (89) 

The scaling behaviour fits into a general scheme. The Gaussian corresponds to a = 2; the 
natural scaling for Gaussian is thus ^/N. The Cauchy distribution corresponds to a = 1; 
the natural scaling is given by A^. 

Thus Levy stable law generalizes the central limit theorem: Levy distributions are 
the only possible limiting distributions for the sums of independent identically distributed 
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random variables. The conventional Central Limit Theorem is a special case restricting 
each random variable to have finite variance and we get Gaussian as a limiting distribution 
of the sum. Levy's more general Central Limit Theorem applies to sums of random 
variables with finite or infinite variance. 

Stable distributions arise in a natural way when 1) a physical system evolves, the evo- 
lution being influenced by additive random factors and 2) when the result of an experiment 
is determined by the sum of a large number of independent random variables. 



8.2 Illustration of the CLT 



I shall illustrate the approach to Gaussian of the distribution of the mean, by consider- 
ing exponential random variables for which all the relevant quantities can be obtained 
analytically. We start with the identity, 



1 



a — ik 



(90) 



We differentiate (A^ — 1) times, both sides of the above with respect to a and set a = 1 
We get, 

1 



dx e 



ikx 



{N-l)l 



(91) 



We immediately identify the right hand side of the above, as the characteristic function of 
the sum of independent and identically distributed exponential random variables with 
mean unity, see Eq. (^). In Eq. (^Tj) above, we replace, on both sides, k by k/N, x by 
Nx, and dx by Ndx, to get. 



dx e 



ikx 



-NxjyN^N-1 



1 



(1-4) 



N ■ 



(92) 



The right hand side of the above is the characteristic function of Y/v see Eq. (|6q). Thus 
we get an exact analytical expression for the probability density function of Y^, as 



fix) 



N 



N 



(iV-1)! 



< X < CXD, 



(93) 



which, for all A^, is a gamma density function, and not a Gaussian! 

The cumulants of the gamma density can be obtained by power series expansion of the 
logarithm of the characteristic function given by Eq. (^). We get, 



-iVln(l-§ 

n=l 



E 

n=l 



{iky 



-{n-iy: 



nl 



N 



n-l 



28 



ik + 



{ikf 1 
2! iV 



i/fc 2! 



(94) 



The n-th cumulant is thus, 



= (95) 



(_n-l)! 

We find Ci = 1 and C2 = as expected. The third cumulant is C3 = and 

it goes to zero as — > 00. In fact all the higher cumulants vanish in the asymptotic 
limit. (Indeed even the second cumulant goes to zero in the limit N ^ 00, which means 
that the probability density of the arithmetic mean is asymptotically a delta function, 
something very satisfying from the point of view of a Monte Carlo practitioner!). The 
important point is that the Gaussian with mean unity and variance becomes a very 
good approximation to the gamma density, for large A^. 

Replacing k by k/^/N , x by x\/iV and dx by dx\fN in Eq. (^T|), we get, 



dx 6^^=^ 



(N-iy. 



^ (96) 



N ■ 



The right hand side of the above equation is the characteristic function of the random 
variable Y ^ = [Xi + X2 + • + Xiv)/\/iV- Thus, the probability density function of Y 
is given by the gamma density. 



fix) = x''-' exp(-^/iVx), (97) 

whose mean is ^/N and whose variance is unity (independent of A^). Fig. ^ depicts the 
gamma density given above for A'^ = 1, 10, 20, 40, 60 and 80 along with the Gaussian of 
mean \/iV and variance unity. For large A^ we find that the Gaussian is a good fit to the 
gamma density, for \x — ^/N\ << 1. 

In the discussions above on the central limit theorem, we have assumed the random 
variables to be independent and identically distributed. The central limit theorem holds 
well under much more general conditions. 

First, the random variables need not be identical. To see this, let A'^i of the A^ random 
variables have one distribution with say mean /xi and variance af. Let Yi denote their 
sum. Also let A'^2 of the N random variables have another distribution with mean ij,2 and 
variance al and let their sum be denoted by Y2. We take the limit A'"! ^ 00 and A'^2 — > 00 
such that N1/N2 remains a constant. Asymptotically (A"! — > 00) the random variable Yi 
tends to a Gaussian and the characteristic function is 4>i{k) = exp{ikNifii — k'^Nia\/2). 
Similarly the random variable Y2 tends to a Gaussian and the characteristic function 
is 4>2{k) = ex.p{ikN2H2 — k'^N2cr2/2). Their sum Y = Yi + Y2 has the characteris- 
tic function (j){k) = (piik) x (j)2{k), since Yi and Y2 are independent. Thus (pik) = 
exp {ik {Nijii + N2H2) — k'^ (A'^iff + A'^2C"2) Fourier inverse of (j){k) is a Gaussian with 
mean Ni^i + N2I-I2 and variance Niaf + N2(T2- 
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Figure 6: Gamma density and the asymptotic Gaussian 



Second, the random variables can be weakly dependent, see for e.g. Gardiner |14]. 
An example is the correlated random walk, described in Assignment (6). Essentially 
the adjacent steps of the random walk are correlated. Despite this, asymptotically the 
distribution of the position of the random walk is Gaussian, see for example []T5|. 
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Assignment 5 

Consider a random variable X with probability density f{x) = 
exp(— \/2|x|)/\/2, for — oo <x< +00. 

(A) What is the characteristic function of A ? 

Let Ai, A2, • • • Aat, be independent and identically distributed random 
variables with the common density f{x) given above. Let Y/v be the sum 
of these, i.e. Yjsi = J2f=i ^i- 

(B) What is the characteristic function of the random variable Y^r ? 
LetYN = Yn/N. 

(C) What is the characteristic function of Y/v ? 
Let Y/]v = Yn/VN. 

(D-a) What is the characteristic function of Y ^ ? 

(D-b) Taylor expand the characteristic function of Y ^ in terms of mo- 
ments and in terms of cumulants. 

(D-c) Employing the Moment and Cumulant expansion, demonstrate the 
approach to Gaussian as A ^ 00. 
(D-d) What is the mean of the asymptotic Gaussian ? 
(D-e) What is its variance of the asymptotic Gaussian 



? 



Assignment 6 

Consider one-dimensional lattice with lattice sites indexed by (• • • — 
3, —2, —1, 0, +1, +2, +3, • • •); a random walk starts at origin and in its first 
step jumps to the left site at —1 or to the right site +1 with equal prob- 
ability. In all subsequent steps the probability for the random walk to 
continue in the same direction is a and reverse the direction is 1 — q. Let 
x{n) denote the position of the random walk after n steps. Show that 
asymptotically (n — > 00) the distribution of x{n) is Gaussian. What is the 



mean of the limiting Gaussian? What is its variance? See |15] 



Thus we find that for the central limit theorem to hold good it is adequate that each 
of the random variables has a finite variance and thus none of them dominates the sum 
excessively. They need not be identical; they can also be weakly dependent. 

Therefore, whenever a large number of random variables additively determine a pa- 
rameter, then the parameter tends to have a Gaussian distribution. Upon addition, the 
individual random variables lose their characters and the sum acquires a Gaussian distri- 
bution. It is due to this remarkable fact that Gaussian enjoys a pre-eminent position in 
statistics and statistical physics. 

Having equipped ourselves with the above preliminaries let us now turn our atten- 
tion to random and pseudo random numbers that form the backbone of all Monte Carlo 



31 



simulations. 



9 RANDOM NUMBERS 

What are Random numbers ? 

We call a sequence of numbers random, if it is generated by a random physical process. 
How does randomness arise in a (macroscopic) physical system while its microscopic con- 
stituents obey deterministic and time reversal invariant laws ? This issue concerns the 
search for a microscopic basis for the second law of thermodynamics and the nature and 
content of entropy (randomness). I shall not discuss these issues here, and those interested 
should consult and the references therein. 

Instead we shall say that physical processes such as radioactivity decay, thermal noise 
in electronic devices, cosmic ray arrival times etc., give rise to what we shall call a sequence 
of truly random numbers. There are however several problems associated with generating 
random numbers from physical processes. The major one concerns the elimination of 
the apparatus bias. See for example [0] where description of generating truly random 
numbers from a radioactive alpha particle source is given; also described are the bias 
removal techniques. 

Generate once for all, a fairly long sequence of random numbers from a random physical 
process. Employ this in all your Monte Carlo calculations. This is a safe procedure. Indeed 
tables of random numbers were generated and employed in the early days of Monte Carlo 
practice, see for example The most comprehensive of them was published by Rand 
Corporation [^] in mid fifties. The table contained one million random digits. 

Tables of random numbers are useful if the Monte Carlo calculations are carried out 
manually. However, for computer calculations, use of these tables is impractical. The 
reason is simple. A computer has a relatively small internal memory; it cannot hold a 
large table. One can keep the table of random numbers in an external storage device like 
a disk or tape; constant retrieval from these peripherals would considerably slow down 
the computations. Hence, it is often desirable to generate a random number as and when 
required, employing a simple algorithm that takes very little time and very little memory. 
This means one should come up with a practical definition of randomness of a sequence 
of numbers. 

At the outset we recognize that there is no proper and satisfactory definition of ran- 
domness. Chaitin has made an attempt to capture an intuitive notion of randomness 
into a precise definition. Following Chaitin, consider the two sequences of binary random 
digits given below: 

10101010101010 

11010010110011 

I am sure you would say that the first is not a random sequence because there is 
a pattern in it - a repetition of the doublet 1 and 0; the second is perhaps a random 
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sequence as there is no recognizable pattern. Chaitin goes on to propose that a sequence 
of numbers can be called random if the smallest algorithm capable of specifying 
it to the computer has about the same number of bits of information as the 
sequence itself. 

Let us get back to the two sequences of binary numbers given above. We recognize that 
tossing a fair coin fourteen times independently can generate both these sequences. Coin 
tossing is undoubtedly a truly random process. Also the probability for the first sequence 
is exactly the same as that for the second and equals 2~^^, for a fair coin. How then 
can we say that the first sequence is not random while the second is ? Is not a segment 
with a discernible pattern, part and parcel of an infinitely long sequence of truly random 
numbers ? See |19] for an exposition of randomness along these lines. 

We shall however take a practical attitude, and consider numbers generated by a 
deterministic algorithm. These numbers are therefore predictable and reproducible; the 
algorithm itself occupies very little memory. Hence by no stretch of imagination can they 
be called random. We shall call them pseudo random numbers. We shall be content 
with pseudo random numbers and employ them in Monte Carlo calculations. We shall 
reserve the name pseudo random numbers for those generated by deterministic algorithm 
and are supposed to be distributed independently and uniformly in the range to 1. We 
shall denote the sequence of pseudo random numbers by {^^i, ^^2, • • •}• For our purpose it is 
quite adequate if one ensures that the sequence of pseudo random numbers is statistically 
indistinguishable from a sequence of truly random numbers. This is a tall order! How can 
anybody ensure this ? 



9.1 Tests for Randomness 

Usually we resort to what are called tests of randomness. A test, in a general sense, 
consists of constructing a function '4>{ri,r2, ■ • •), where ri, r2, • • •, are independent variables. 
Calculate the value of the function for a sequence of pseudo random numbers by setting 
= V i = 1, 2, • • •. Compare this value with the value that ijj is expected to have if 
{rj : i = 1, 2, • • •} were truly random numbers distributed independently and uniformly in 
the range to 1. 

For example, the simplest test one can think of, is to set 

1 ^ 

V'(n,r2,---) = ^E^^' (98) 

i=l 

which defines the average of N numbers. For a sequence of N truly random numbers 
we expect to lie between .5 — e and .5 + e with a certain probability p(e). Notice that 
for N large, from the Central Limit Theorem, ■0 is Gaussian with mean 0.5 and variance 
C7^ = (12A^)^^. If we take e = 2a = l/VSN, then p{e) is the area under the Gaussian 
between .5 — e and .5 + e and is equal to 0.95. This is called the two-sigma confidence 
interval. Thus for a sequence of N truly random numbers, we expect its mean to be 
within ibe around 0.5 with .95 probability, for large A^. If a sequence of N pseudo random 
numbers has an average that falls outside the interval (.5 — e, .5 + e) then we say that it 
fails the test at 5% level. 
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Assignment 7 



Carry out the above test on the random numbers generated by one of the 
generators in your computer. Docs it pass the test at 5% level ? 



Another example consists of setting, 



^^E^=l^r^+ik-(E^=lr^) 

V.(ri, r2, • • •) = C{k) = , (99) 



with A; = 0, 1, 2, • • • , A^ — 1. For a sequence of truly random numbers, the function ip above, 
which denotes two point auto correlation function, is expected to be unity for fc = and 
zero for A; 7^ 0. 



Assignment 8 

Calculate the auto correlation of the random numbers generated by one of 
the random number generators in your computer. 



In practice, one employs more complicated tests, by making a more complicated 
function of its arguments. 

An example is the run test which is more sensitive to the correlations. The idea 
is to calculate the length of a run of increasing (run-up) or decreasing (run-down) size. 
We say the run-down length is I if we have a sequence of random numbers such that 
> < ^,1-1 < < • • • ^2 < ^1- We can similarly define the run-up length. 

Let me illustrate the meaning of run-down length by considering a sequence of integers 
between and 9, given below. 



|6 4 1 3|4 5|3 2 7|4 8| 

I 3 I 1 I 2 I 1 I 

The first row displays the sequence; the second depicts the same sequence with numbers 
separated into groups by vertical lines. Each group contains n + 1 numbers the first n 
of which are in descending order, i.e. these numbers are running down. The descent is 
broken by the {n + \Y^ number which is greater than the n*^ number. The third row gives 
the run-down length, for each group, which is n. 

We can calculate the probability distribution of the run-down length as follows. Let 
P{L > I) be the probability that the run-down length L is greater than or equal to I. To 
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calculate P{L > I) we consider a sequence of / distinct random numbers. There are U 
ways of arranging these numbers. For a sequence of truly random numbers, each of these 
arrangements is equally likely. Of these, there is only one arrangement which has all the 
I numbers in the descending order. Hence P{L > I) = 1/U. Since P{L = I) = P{L > 
l)-P{L > / + we get 

Alternately, for the probability of run-down length we have, 

/•I /"Ci rii-\- 
P{L >l) = dii di2--- dii 

JO JO JO 



1 

n 



(101) 



In the test, we determine numerically, the distribution of the run length on the sequence 
of pseudo numbers and compare it with the exact distribution given by Eq. ( |100| ). Fig. ^ 
depicts the results of a run-up test. Description of several of the most common tests for 



randomness can be found in [21|. 



One issue becomes obvious from the above discussions. There is indeed an indefinitely 
large number of ways of constructing the function ip. Hence, in principle a pseudo random 
number generator can never be tested thoroughly for the randomness of the sequence of 
random numbers it generates. What ever may be the number of tests we employ and 
however complicated they may be, there can be, and always shall be, surprises. ...surprises 



like the Marsaglia planes discovered in the late sixties |22| or the parallel lines discovered 
some three years ago by Vattulainen and co-workers |22]. We shall discuss briefly these 
issues a bit later. 



Assignment 9 

Carry out (a) run up and (b) run down tests, on the random numbers 
generated by one of the random number generators in your computer and 
compare your results with the exact analytical results. It is adequate if 
you consider run lengths up to six or so. 



The struggle to develop better and better random number generators and the simul- 
taneous efforts to unravel the hidden order in the pseudo random numbers generated, 
constitute an exciting and continuing enterprise, see for example [^]. See also Bonelli and 
Ruffo [pi] for a recent interesting piece of work on this subject. 
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1 2 3 4 5 6 7 

run-up length, I 

Figure 7: Results of a run-up test. The histogram plot corresponds to what is expected from 
a sequence of truly random numbers. See Eq. (100). The points with error bars correspond to 
the ones obtained with random number generator routine RAND of MATLAB version 4.1. The 
run-up lengths were generated from a sequence of 10,000 numbers. The statistical errors in the 
calculated values correspond to one-sigma confidence interval. 



9.2 Pseudo Random Number Generators 

The earliest pseudo random number generator was the mid-squares proposed by von 
Neumann. Start with an m digit integer z^i. Take the middle m/2 digits and call it rji. 
Square rji and call the resulting m digit integer as 1^2 ■ Take the middle m/2 digits of 1^2 
and call it 772 . Proceed in the same fashion and obtain a sequence of integers i]i,r]2,- ' 
These are then converted to real numbers between zero and one by dividing each by 
For a properly chosen seed vi, this method yields a long sequence of apparently 
good random numbers. But on a closer scrutiny, it was found that the mid-squares is not 
a good random number generator. I shall not spend anymore time on this method since 
it is not in vogue. Instead we shall turn our attention to the most widely used class of 
random number generators called the linear congruential generators, discovered by 
Lehmer [^]. Start with a seed Ri and generate successive integer random numbers by 



Ri+i = aRi + b (mod m), 

where a, b, and m are integers, a is called the generator or multiplier; b. 
and m is the modulus. 



(102) 

the increment 



Equation (102) means the following. Start with an integer Ri. This is your choice. 
Calculate a x Ri + b. This is an integer. Divide this by m and find the remainder. Call 
it R2- Calculate a x R2 + b ; divide the result by m ; find the remainder and call it R3. 
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Proceed in the same fashion and calculate the sequence of integers {Ri, R2, R3, Ra, • • •}, 
initiated by the seed Ri. Thus {Ri : i = 1, 2, • • •} is a sequence of random integers. The 
above can be expressed as, 



iij+i = (a X Ri + b) 



(a X Ri) + b 



X m 



(103) 



m 



where the symbol [ r] ] represents the largest integer less than or equal to r/; e.g., [it] = 
3, [V2] = 1, [4.999] = 4, etc.. The random integers {Rj} can be converted to floating 
point numbers by dividing each by m. i.e., = Rj/m. Then {^j : i = 1,2, • • •} is the 
desired sequence of pseudo random numbers in the range to 1. 

The linear congruential generator is robust (over forty years of heavy-duty use!), simple 
and fast; the theory is reasonably well understood. It is undoubtedly an excellent technique 
and gives fairly long sequence of reasonably good random numbers, when properly used. 
We must exercise great care in the use of congruential generators, lest we should fall into 
the deterministic traps. Let me illustrate: 

Consider Eq. ( |102| ). Let us take a = 5, b = 1, Ri = 1, and m = 100. The results of 
the recursions are shown in Table (3). 

Table 3: Linear congruential recursion with a — 5, b — 1, m — 100, and i?i = 1. 



R2 = 


(5x 


1 


+1) 


(mod 


100) = 


6 


(mod 


100) = 


6 


R3 = 


(5x 


6 


+ 1) 


(mod 


100) = 


31 


(mod 


100) = 


31 


R4 = 


(5x 


31 


+1) 


(mod 


100) = 


156 


(mod 


100) = 


56 


R5 = 


(5x 


56 


+1) 


(mod 


100) = 


281 


(mod 


100) = 


81 


Rq = 


(5x 


81 


+ 1) 


(mod 


100) = 


406 


(mod 


100) = 


6 



We see from Table (3) that R^ = R2; R7 = R3, ■ ■ ■, and the cycle repeats. The period 
is four. We just get four random(?) integers! 

Consider another example with a = 5, 6 = 0, m = 100, and Ri = 1. Table (4) depicts 
the results of the linear congruential recursion. 

Starting with the seed 1, we get 5 followed by an endless array of 25, as seen from 
Table (4). These examples illustrate dramatically how important it is that we make a 
proper choice of a, b and m for decent results. 



1 
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Table 4: Linear congruential recursion with a — 5, b — 0, m — 100, and Ri = 1 



Ri 






i?2 = 


(5x 


1) 


i?3 = 


(5x 


5) 


i?4 = 


(5x 


25) 



(mod 100) = 5 
(mod 100) = 25 
(mod 100) = 125 



= 1 

(mod 100) = 5 

(mod 100) = 25 

(mod 100) = 25 



Clearly, whatever be the choice of a, b and m, the sequence of pseudo random integers 
generated by the linear congruential technique shall repeat itself after utmost m steps. The 
sequence is therefore periodic. Hence, in applications we must ensure that the number of 
random numbers required for any single simulation must be much less than the period. 
Usually m is taken very large to permit this. 

For the linear congruential generator, we can always get a sequence with full period, 
m, if we ensure: 

1. m and b are relatively prime to each other; i.e. gcd(m, b) = 1 : 

2. a = 1 (mod p) for every prime factor p of m; and 

3. a = 1 (mod 4), if m = (mod 4). 

For example, let a = 7, 6 = 13 and m = 18. Check that this choice satisfies the above 
conditions. The results of the linear congruential recursion with the above parameters are 
depicted in Table (5). 

Thus we get the sequence 

{1, 2, 9, 4, 5, 12, 7, 8, 15, 10, 11, 0, 13, 14, 3, 16, 17, 6} 

of eighteen distinct integers between and 17, with full period 18. Divide each number 
in this sequence by 18 and get a sequence of real numbers between and 1. That the 
period is maximum m only ensures that the numbers are uniform in the range to 1. The 
numbers may be correlated or may have a hidden order. 

Let us embed the above sequence in a two dimensional phase space. This is carried out 
as follows. Form two-dimensional embedding vectors: {^1,^,2), {^2,^,3), (Cmi^i)- We 
have thus m vectors and in our example m = 18. Each of these vectors can be represented 
by a point in two dimensional phase space. Fig. ^ depicts these eighteen points. We 
observe that the points fall neatly on parallel lines. 
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Table 5: Linear congruential recursion with a = 7, b = 13, m = 18 and Ri — 1 



Ri = 1 



-0.2 
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Rq 
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(7x 


5 


+ 13) 


(mod 


18) = 


48 


(mod 


18) = 


12 


Rj 




(7x 


12 


+ 13) 


(mod 


18) = 


97 


(mod 


18) = 


7 
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8 


+13) 


(mod 


18) = 


69 


(mod 


18) = 


15 


Rio 




(7x 


15 


+13) 


(mod 


18) = 


118 


(mod 


18) = 


10 


Rn 




(7x 


10 


+13) 


(mod 


18) = 


83 


(mod 


18) = 


11 


Rl2 




(7x 


11 


+13) 


(mod 


18) = 


90 


(mod 


18) = 





Rl3 




(7x 





+13) 


(mod 


18) = 


13 


(mod 


18) = 


13 


i?14 




(7x 


13 


+13) 


(mod 


18) = 


104 


(mod 


18) = 


14 


^15 




(7x 


14 


+13) 


(mod 


18) = 


111 


(mod 


18) = 


3 


Rl6 




(7x 


3 


+13) 


(mod 


18) = 


34 


(mod 


18) = 


16 


Rn 




(7x 


16 


+13) 


(mod 


18) = 


125 


(mod 


18) = 


17 


Rl8 




(7x 


17 


+13) 


(mod 


18) = 


132 


(mod 


18) = 


6 


Rl9 




(7x 


6 


+13) 


(mod 


18) = 


55 


(mod 


18) = 


1 



Take another example with a = 137, b = 187, and m = 256. We get a sequence of 
256 distinct integers between and 255 in some order starting from the chosen seed. Let 
us convert the numbers into real numbers between zero and one by dividing each by 256. 
Let {^i : i = 1, 2, • • • , 256} denote the sequence. Embed the sequence in a two-dimensional 
'phase space' , by forming two-dimensional vectors as discussed above. Fig. ^3 depicts 
the results of this exercise. The vectors clearly fall on parallel lines. 

That the random numbers generated by linear congruential generators form neat pat- 
terns when embedded in two and higher dimensional phase space was known since as early 
as the beginning of sixties, see for example [27|. But no one recognized this as an inher- 
ent flaw in the linear congruential generators. It was generally thought that if one takes 
a linear congruential generator with a very long period, one would not perhaps see any 
patterns |28|. 

Long periods can be obtained by choosing the modulus m large and by choosing 
appropriate values of the multiplier a and the increment b to get the full period. 
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Figure 8: Random numbers of the linear congruential generator embedded in a two di- 
mensional phase space, (a) depicts the results for the generator with a = 7, 6 = 13 and 
m = 18. (b) depicts the results for the generator with a = 137, b = 187 and m = 256. 
(Both these generators give full period) 



Assignment 10 



Construct a linear congruential generator that gives rise to a long sequence 
of pseudo random numbers with full period. Embed them in two or higher 
dimensional phase space and see if there are any patterns. Test for the 
randomness of the sequence. 



The modulus m is usually taken as 2*^^ — 1, where t is the number of bits used to 
store an integer and hence is machine specific. One of the t bits is used up for storing 
the sign of the integer. The choice a = 7^ = 16801, 6 = and m = 2^^ — 1, for a 32 
bit machine has been shown to yield good results, see ||2^. The Ahren generator which 
specifies a = 2*~-^(\/5 — l)/2, 6 = 0, has also been shown to yield 'good' random numbers. 
The linear congruential generators have been successful and invariably most of the present 
day pseudo random number generators are of this class. 



In the late sixties, Marsaglia |22| established unambiguously that the formation of lat- 



tice structure is an inherent feature of a linear congruential generator: 



If d-tuples (^1, ^2, • • • , Cd), (6, ^3, • • • , Cd+i), (6, ^4, • • • , • • •, of the random 

numbers produced by linear congruential generators are viewed as points in 
the unit hyper cube of d dimensions, then all the points will be found to lie 
in a relatively small number of parallel hyper planes. Such structures occur 
with any near congruential generator and in any dimension. 

The number of such Marsaglia planes does not exceed ((i!2*)^/'^. Besides the points 
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on several Marsaglia planes form regular patterns. Thus it is clear now that the existence 
of Marsaglia planes is undoubtedly a serious defect inherent in the linear congruential 
generators. 

Then there is the other matter of the presence of correlations, hopefully weak, in the 
sequence of pseudo random numbers. If your Monte Carlo algorithm is sensitive to the sub- 
tle correlations present in the sequence, then you are in problem. For example Ferrenberg, 
Landau and Wong ||3^, found that the so-called high quality random number generators 
led to subtle but dramatic errors in algorithm that are sensitive to the correlations. See 



also pl |. 



One should perhaps conclude that the linear congruential generators are not 
suitable for Monte Carlo work! Before we come to such a harsh conclusion, let us 
look at the issues in perspective. If your Monte Carlo program does not require more than 
a small fraction (smaller the better) of the full period of the random numbers, then most 
likely, the presence of Marsaglia lattice structure will not affect your results. All the linear 
congruential generators in use today have long periods. Also one can think of devices to 
increase the number of Marsaglia planes. For example the Dieter- Ahren 'solution' to the 
Marsaglia problem is the following algorithm, 

Ri+2 = aRi+i + hRi (mod m), (104) 

which requires two seeds. The modulus m = 2*~^ — 1. For a proper choice of a and 6, the 
number of Marsaglia planes can be increased by a factor of 2*/^^. 

Thus over the period of thirty years we have learnt to live with the lattice defect of 
the linear congruential generators. However at any time you get the feeling that there is 
something wrong with the simulation results and you suspect that this is caused by the 
Marsaglia lattice structure of the linear congruential generator then you should think of 
employing some other random number generator, like the inversive congruential gen- 
erator (ICG) proposed in 1986 by Eichenauer and Lehn |32| or the explicit inversive 



congruential generator (EICG), proposed in 1993 by Eichenauer-Hermann [^^. Both 
these generators do not suffer from the lattice or hyper plane structure defect. I shall 
not get into the details of these new inversive generators, except to say that the inversive 
congruential generator also employs recursion just as linear congruential generators do, 
but the recursion is based on a nonlinear function. 

More recently in the year 1995, Vattulainen and co-workers ||2^ found, in the context 
of linear congruential generators, that the successive pairs {£,t^it+i) exhibit a pattern of 
parallel lines on a unit square. This pattern is observed for all (^t, it+i) with / = 1, 2, • • •, 
where the modulus m = 2^^ — 1 for a 32 bit machine. The number of parallel lines strongly 
depends on I. For I = {m — l)/2 the pairs fall on a single line, and for / = {(m — l)/2} ± 1, 
the pairs fall on so large a number of parallel lines that they can be considered as space 
filling. This curious phenomenon of switching from pairs falling on one or a few parallel 
lines to pairs falling on several lines upon tuning the parameter I is termed as transition 



from regular behaviour to chaotic behaviour, see |34]. Thus there seems to be a connection 
between pseudo random number generators and chaotic maps. Chaos, we know, permits 
only short time predictability; no long term forecasting is possible. The predictability gets 
lost at long times at times greater than the inverse of the largest Lyapunov exponent. 
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It is rather difficult to distinguish a chaotic behaviour from a random behaviour. Thus 
Chaos can provide an effective source of randomness. This is definitely meaningful idea 
since chaotic and random behaviours have many things common, see for example j35[ |, 
where Chaos has been proposed as a source of pseudo random numbers. But then we know 
there is an order, strange (fractal) or otherwise, in Chaos. What exactly is the connection 
between the Marsaglia order found in the context of linear congruential generators and the 
fractal order in Chaos ? If answers to these and several related questions can be found, 
then perhaps we can obtain some insight into the otherwise occult art of pseudo random 
number generators. A safe practice is the following. Consider the situation when you are 
planning to use a standard Monte Carlo code on a new problem; or consider the situation 
when you are developing a new Monte Carlo algorithm. In either case, carefully test 
your Monte Carlo along with the random number generator, on several standard problems 
for which the results are reasonably well known. This you should do irrespective of how 
'famous' the random generator is, and how many randomness tests it has been put through 
already. Afterall your Monte Carlo programme itself can be thought of as a new test of 
randomness of the pseudo random number generator. 

I shall stop here the discussion on pseudo random number generators and get on 
to Monte Carlo. In the next section I shall discuss random sampling techniques that 
transform the pseudo random numbers, independent and uniformly distributed in 

the range to 1, to {xi}, having the desired distribution in the desired range. 

10 RANDOM SAMPLING TECHNIQUES 

The random sampling techniques help us convert a sequence of random numbers 
uniformly distributed in the interval (0, 1) to a sequence {xi} having the desired density, 
say f{x). There are several techniques that do this conversion. These are direct inversion, 
rejection methods, transformations, composition, sums, products, ratios, table-look-up of 
the cumulative density with interpolation, construction of equi-probability table, and a 
whole host of other techniques. In what follows we shall illustrate random sampling by 
considering in detail a few of these techniques. 

10.1 Inversion Technique 

The simplest is based on the direct inversion of the cumulative density function of the 
random variable X. We shall first consider sampling from a discrete distribution employing 
inversion technique. Let {pi : i = 1,2, ■ ■ ■ , N} be the discrete probabilities for the random 
variable X to take values Xi{= i). Fig. ^ depicts an example with N = 5. We first 
construct the cumulative probabilities {Pi : i = 0, A^} where Pq = 0; Pi = pi; P2 = 
Pi + P2 '■ ■ ■ ■ Pk = Pi + P2 + ■ ■ ■ Vk] Pn = 1- Fig- §b depicts the cumulative probabilities 
as staircase of non-decreasing heights. The procedure is simple. Generate ^, a random 
number uniformly distributed in the range (0, 1). Find k for which Pk-i < C < Pk- Then 
Xk{= k) is the sampled value of X. This is equivalent to the following. The value of ^ 
defines a point on the y axis of Fig. ^b between zero and one. Draw a line parallel to the 
X axis at (0,.^). Find the vertical line segment intersected by this line. The vertical line 
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segment is then extended downwards to cut the x axis at say xi. Then xi is the (desired) 
sampled value of X. 



0.5r 
0.4 



iiO.3 
n 
a 
n 

2 0.2 

Q. 



0.1 



(a) 



TO 

n 
o 

Q.0.6 

> 

roO.4 

u 

E 

30.2 



(b) 



2 4 

X 



Figure 9: Discrete distribution, (a) probabilities (b) cumulative probabilities 



Assignment 11 

Employing inversion technique sample from (a) binomial and (b) Poisson 
distributions. Compare the frequency distribution you obtain with the 
exact. 



The above can be generalized to continuous distribution. We have the cumulative proba- 
bility distribution F{x) = fix') dx' . Note that F(— oo) = and F{+oo) = 1. Given 
a random number ^j, we have xi = F~^{^i). 

An example with exponential distribution is shown in Fig. 11^. We have plotted in Fig. 



lOp, the exponential distribution for x > 0. Fig. |lOp depicts the cumulative distribution. 
Select a point on the y axis of Fig. p!o| b randomly between zero and one. Draw a line 
parallel to x axis passing through this point. Find where it intersects the curve F(x). 
Read off the x coordinate of the intersection point, and call this xi. Repeat the above 
several times and get {xi : i = 1,2, ■ ■ ■}. These numbers will be exponentially distributed. 
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Figure 10: (a) exponential distribution (b) cumulative distribution 



Assignment 12 

Devise a technique to sample from the distribution f{x) = 
exp(— \/2|a:|)/\/2 for — oo < a; < +00. Generate {xj : i = 1,2,---,N}. 
Sum up these numbers and divide by \/iV. Call this y. Generate several 
values of y and plot their frequency distribution in the form of a histogram. 
Carry out this exercise with N = 1, 2, • • • and demonstrate the approach 
to Gaussian as N becomes larger and larger. 



In fact for the exponential distribution we can carry out the inversion analytically. We 
set ^ = F{x). It is clear that the number F{x) is uniformly distributed between and 1. 
Hence the probability that it falls between F{x) and F{x) + dF{x) is dF{x), which is equal 
to f{x)dx. Hence x = F~^{^) is distributed as per f{x). For exponential distribution we 
have F{x) = 1 — exp(— x). Hence x = — log(l — ^). Since 1 — ^ is also uniformly distributed 
we can set x = — log(^). 



Assignment 13 

Generate N independent random numbers from an exponential distribu- 
tion. Sum them up and divide by ^/N ; call the result y. Generate a large 
number of values of y and plot their frequency distribution. Plot on the 
same graph the corresponding gamma density and Gaussian and compare. 
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Assignment 14 



Start with particles, indexed by integers i = 1,2, ■ ■ ■ , N . {e.g. N = 
1024). Initialize x{j) = A V j = 1,2,---,A^, where A is the desired 
exponential decay constant, {e.g., A = 1). The algorithm conserves 
J2iLi x{^) = Select independently and randomly two particles, say 

with indices i and j, and i ^ j. Let S = x{i) + x{j). Split S randomly 
into two parts. Set x{i) to one part and x{j) to the other part. Repeat 
the above for a warm up time of say 4 x iterations. Then every subse- 
quent time you select two particles {k and /), the corresponding x{k) and 
x{l) are two independent random numbers with exponential distribution: 
Aexp(— Ax) for < X < oo. 

(a) Implement the above algorithm and generate a large number of ran- 
dom numbers. Plot their frequency distribution and check if they follow 
exponential distribution. 

(b) Prove analytically that the above algorithm leads to independent and 
exponentially distributed random numbers in the limit — > oo. 



Analytic inversion to generate exponentially distributed random numbers is not neces- 
sarily the most robust and fast of the techniques. There are several alternate procedures 
for sampling from exponential distribution without involving logarithmic transformation, 
see for example |3^. The subject of developing ingenious algorithms for generating ex- 
ponentially distributed random numbers continues to attract the attention of the Monte 



Carlo theorists. For example, recently Fernandez and Rivero |37] have proposed a simple 
algorithm to generate independent and exponentially distributed random numbers. They 
consider a collection of N particles each having a certain quantity of energy to start with. 
Two distinct particles are chosen at random; their energies are added up; the sum is di- 
vided randomly into two portions and assigned to the two particles. When this procedure 
is repeated several times, called the warming up time, the distribution of energies amongst 
the particles becomes exponential. Note that this algorithm conserves the total energy. It 
has been shown that about one thousand particles are adequate; the warming up time is 
AN or so. For details see [37|; see also 



10.2 Rejection Technique 

Another useful random sampling technique is the so called rejection technique. The 
basic idea is simple. From a set of random numbers discard those that do not follow the 
desired distribution. What is left out must be distributed the way we want. 

Let f{x) be the desired distribution, defined over the range (a, /3). First select a 
suitable bounding function g{x) such that C x g{x) > f{x) for all values of a < x < /3. 
Also g{x) much be such that it is easy to sample from. Sample a set of {xj : i = 1, 2, • • • , A^} 
independently and randomly from ^(x). For each value of Xj select randomly a number 
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between and C x g{xi). Call this set {y(xi)}. From the set {xi : i = 1,2, ■ ■ ■ , N} discard 
those for which y{xi) > f{xi). The remaining Xi-s shall have the desired distribution /(x). 
The efficiency of a rejection technique is the percentage of the attempts that get accepted 
and is given by the inverse of the area under the curve C x g{x). Usually g{x) is taken as 
a constant for all x and C is the maximum value that f{x) takes. The rejection technique 
would be inefficient if C x g(x) and f{x) do not have similar shapes and ranges. 




0.5 1 0.5 1 



X 




Figure 11: Illustration of the rejection technique, (a) the circular density and the Cx the bounding 
function, (b) random points that uniformly fill up the rectangular box. (c) points that get accepted 
in the sampling procedure, (d) the distribution of the accepted values of x. 

Let me illustrate the rejection technique by considering the circular probability density 
function 

4 / 

f{x) = -Vl-x^ for < X < 1. (105) 
vr 

A convenient choice of the bounding function is g{x) = 1V 0<x<l. Thus sampling 
from g{x) is equivalent to setting Xi = ^j. The value of C is A/tt. Fig. pT^ depicts the 
density function /(x) and the function C x g{x). Generate a pair of random numbers 
iCii^j) from U{0, 1). Set Xi = ^j. Calculate = /(xj) and yi = C x ^j. If yi < fi, accept 
Xi. Repeat the above procedure several times. Fig. pd[ b depicts {{xi,yi);i = 1, 2, • • • , 1000} 
and these points are seen to fill up the rectangle bounded by the lines y = 0, y = 4/7r, x = 
and X = 1. Fig. pT| c depicts the accepted pairs {(xj,yi)}. Fig. [Tl| d depicts the histogram 
of the distribution of the accepted values of Xj . 

In the introduction, I mentioned of the 'pebble-throwing' game popular in the province 
of Monacco, from which the name Monte Carlo came. What has been described above 
is a straight forward adaptation of the same (Monte Carlo) game to sampling from a 
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(normalized) distribution. I said earlier that one can make an estimate of the value of vr 
from the Monte Carlo game. How do we do this ? 

In the rejection technique described above, we sample a point randomly from the 
rectangular region, see Fig. |l^a. In the next step we either accept the point or reject 
it. Thus there are only two outcomes to the experiment. The point either falls inside 
the 'circular distribution' curve (and hence gets accepted) with a probability p = tt/A 
or outside (and hence gets rejected) with a probability q = 1 — p = 1 — 7r/4. We are 
essentially tossing a loaded coin; the probability of Heads is p and of Tails q = 1—p. Let n 
denote the number of Heads in independent throws of a loaded coin. The distribution 
of the random variable n is Binomial. The quantity of interest to us is n/N, whose mean 
and standard deviation can easily be calculated, and are given by p and \/p{l — p)/^/N 
respectively. Thus we say the Monte Carlo estimate of vr is An/N it (A/N^^'^)y/n{N — n), 
where the error term is the one-sigma confidence interval. Fig. |l2| depicts the estimated 
value of TT with the one-sigma confidence error bar, as a function of the logarithm (to the 
base 2) of the number of trials N = 2^°, • • • 2^^. The statistical convergence to the 
right answer as the sample size increases is seen. 



Assignment 15 

Employ rejection technique to sample from f{x) = — x"^ for — 1 < 
X < 1. Plot the distribution of the accepted sequence of numbers in the 
form of a histogram and compare with the exact. Calculate the value of vr 
from the above experiment ? What is the statistical error ? 



10.3 Sampling from a Gaussian 

The Gaussian is the most important distribution in statistics and statistical physics. It is 
also one of the richest, in the sense, a very large number of techniques have been proposed 
to generate Gaussian random numbers. The simplest perhaps is the one that makes use 
of the Central Limit Theorem: take the sum of random numbers, uniformly distributed 
in the range (0, 1). Let Yat denote the sum. By Central Limit Theorem, the distribution 
of Y]\f will tend to a Gaussian in the limit N — > oo. The mean of the Gaussian is N/2 and 
its variance is N/12. Usually one wants a standard Gaussian with mean zero and variance 
unity. Therefore we define the quantity q, 

Yn - (N/2) , , 

^/NjVl ^ ' 

which has a Gaussian distribution of zero mean and unit variance for large N . A convenient 
choice is A = 12, which reduces q to 112 — 6: Take twelve pseudo random numbers and add 
them up; subtract six from the sum and you have a mean zero variance unity Gaussian 
random number. This method of generating Gaussian random numbers is not exact since 
the tails are absent. Note that the value of q is restricted to the interval (—6, +6). 



47 



3.2 
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Figure 12: Monte Carlo estimate of tt employing rejection technique 

There is a clever technique that transforms two independent uniform random variables 
^1 and ^2 to two independent Gaussian random variables qi and q2, as follows. 

qi = In ,^1 cos(27r^2), 

92 = \/-2 In 6 sin(27re2). (107) 



This method, called Box-Muller algorithm [^9[, is easy to program. This algorithm 
is quite popular. However it has a few drawbacks. Box-Muller algorithm uses several 
multiplication, one logarithmic function, one trigonometric function and one square root 
function; hence it is rather slow. Besides the tails differ markedly from the true when a 
linear congruential generator is used. 

Muller [^ ] gives an account of several methods of sampling from a Gaussian distribu- 
tion. 



Assignment 16 

Generate random numbers from C/(0, 1). Add them up and subtract 
A^/2. Divide the result by \JN jXI. Call this y. Generate a large num- 
ber of y and plot their frequency distribution. Take N = 2,3,4, •• •, and 
demonstrate the approach to Gaussian (of mean zero and variance unity). 



Recently Fernandez and Criado |41] have proposed a fast algorithm to generate Gaus- 



sian random numbers. Their algorithm is based on an N particles closed system interacting 



48 



two at a time, conserving energy. Start with particles each having the same velocity 
unity, i.e. {vj = 1 V i = 1, 2, • • • , A^}. Pick up two particles at random; let them be i and 
J, and i ^ j. Reset their velocities as per the following iteration rule, 



Vi new 



fj(old) + f j(old) 
(old) — Vi{o\d) 



v,{new) = (108) 

Repeat the above several times. After initial warm up time of say AN iterations or 
so, the velocities of the pair of particles you pick up in all subsequent iterations are the 
desired pairs of independent Gaussian random numbers with mean zero and variance 
unity. Gaussian of desired mean, say fx and standard deviation, say a can be obtained 
by the transformation x = av + ji. Note that X^i^i "^"i = N dX all stages of iteration. 
This algorithm is found to be ten times faster than the Box-Muller algorithm. For most 
applications, it is adequate if ten thousand to hundred thousand particles are considered, 
see [0] for details. 



Assignment 17 

(a) Prove analytically that the Fernahdez-Criato algorithm leads to inde- 
pendent Gaussian random numbers. Prove that the sampling is ergodic. 
Seefg. 

(b) Generate a large number of Gaussian random numbers employing 
Fernandez-Criado algorithm and plot their frequency distribution. Com- 
pare with the exact Gaussian. 



We have implemented the algorithm proposed by Fernahdez-Criado and a sample result 



is depicted in Fig. 13. Ten thousand particles were considered. The first 40, 000 iterations 
were considered as warm up time. We generated twenty thousand numbers and collected 
them in thirty equal bins. The statistical error (one - sigma confidence interval) on the 
counts in each bin were also calculated from the sample fluctuations. These are also 



depicted in Fig. 13 



10.4 Metropolis Sampling 

An interesting technique to sample from a probability distribution is the one proposed by 
Metropolis and his collaborators |^2|, in the context of generating microstates belonging to 
a canonical ensemble. Metropolis algorithm is widely used in Monte Carlo simulation 
of models in statistical physics. 

Here I shall illustrate the technique for sampling from an arbitrary discrete distribution 
/(i), where the random variable takes discrete integer values i between 1 and N . We 
call 1,2,---,A^ as states and denote the set of all states by = {1,2,---A^}. Start 
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Figure 13: Sampling from a Gaussian of mean zero and variance unity employing the technique 
proposed by Fernandez and Criado [39]. 10,000 particles were considered; warm up time was 
40, 000 iterations; 20, 000 random numbers were generated and assembled in thirty equal bins; the 
error bar corresponds to one-sigma confidence in each bin. The solid curve is the exact Gaussian 

with an initial arbitrary state xq belonging to 0,. The Metropolis sampling technique 
generates a Markov chain of states xi(e 0),X2(G Q), ■ ■ ■ Xjn-i(£ 17). For m oo, 
{xm+ii& ^),Xm+2{^ ^)) ' ' '} shall have the distribution {/(i) : i = 1, 2, • • • , N}. A Markov 
chain is a stochastic process whose 'past' has no influence on the future if its 'present' is 
specified. In other words, for a Markov Chain, we have, 

P {Xk < x\Xk-l,Xk-2, ■ ■ ■ ,xo) = P {Xk < x\xk-i) ■ (109) 

We call {xm+i, • • •} the desired ensemble of states. If Ni is number of elements in 

the ensemble whose state index is i, then Ni divided by the total number of members 
in the ensemble is the probability f{i)- For such asymptotic convergence to the desired 
ensemble, it is sufficient though not necessary that 

/(x,) Wixj ^ Xi) = f{xj) Wix^ ^ Xj). (110) 

where W{xj <— Xi) is the probability of transition from state Xi to state xj. Equation 
( |11C| ) is called the detailed balance. For conservation of probability, the transition 
probabilities should obey the following condition, 

Y^w{i^j) = iyj. (Ill) 

i 
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The detailed balance condition does not specify uniquely the transition probabilities from 
one state to the other. A simple choice of the N x N transition matrix W with elements 
Wij = W{i <^ j) which is consistent with the detailed balance is given by, 

W,j = W^, min (l,M) for i^j, 



mj 

N 



^iJ = ^jj + E ^ij ™ [0,1- j^], (112) 



m 



where the matrix W* with elements W^j is an arbitrary symmetric stochastic matrix with 
positive elements. We call W* the trial matrix. The sum of the elements in each row as 
well as each column of the trial matrix W* is unity. As we shall see below, we need W* 
matrix to select a trial state from the current state. Hence W* is chosen conveniently to 
make the selection of the trial state simple. 



Assignment 18 

Verify that the transition matrix W whose elements are calculated as per 
the prescriptions in Eqns. (11121) obeys the detailed balance given by Eq. 
ITC 



The implementation of the Metropolis sampling procedure proceeds as follows. Let 
Xj(G ri) be the current state (or value). We select a trial state xt, randomly and with 
equal probability from amongst the states of = {1, 2, • • • N}. This in other words 
means that W-'j = l/N V i,j = 1,2,---,N. Calculate the ratio w = f{xt)/f{xi). If 
u; > 1, accept the trial state and set Xj+i = xt- If u; < 1, then generate a random number 
^ uniformly distributed in the range to 1. If ^ < if, then accept the trial state and 
set Xj+i = Xt- li > w, reject the trial state and set Xj+i = Xi and proceed. It may 
be necessary to generate several values of x, starting from an initial choice xq, before the 
string acquires the desired distribution. A good choice of xq is that state for which the 
probability is maximum. 

Let us consider a simple example with three states = {1, 2, 3} with |/) = (/i, /2, /s)' = 
(1/4, 5/12, 1/3)'. The matrix W* is taken as W*j = 1/3 V The transition matrix 

constructed by the prescription, see Eq. (|112|), is given by, 




W = 1/3 8/15 1/3 . (113) 



As can be readily seen, the matrix W above, has the following properties. 

1. Wij > V This is called the strong ergodicity condition. 

2. J2i=i = 1 V i- As mentioned earlier, this condition ensures the conservation of 
the probability. The transpose of the matrix W is usually called a stochastic matrix. 
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The eigenvalues of W are 1.0, 0.2, and 0.0833; the largest eigenvalue is unity and is 
non degenerate. The corresponding right eigenvector is |/) = (1/4, 5/12, 1/3)' i.e., 
W\f) = \f). 



Assignment 19 

Construct a matrix W: Wij > V and J2i^i,j = 1^ i- Calculate 
its eigenvalues and the corresponding left and right eigenvectors. Operate 
W on an arbitrary vector several times and show that the resulting vector 
converges to a unique vector which is the eigenvector of the matrix W, 
corresponding the largest eigenvalue unity. 



Thus repeated application of W on any arbitrary vector \u) with {f\u) ^ 0, will 
asymptotically take the vector to |/). We say |/) is the equilibrium probability vector 
representing the equilibrium ensemble. Any initial ensemble represented by \u) with non- 
zero overlap with the equilibrium ensemble, will evolve to the equilibrium ensemble. The 
above results are true in general for any positive stochastic matrix W, by virtue of the 



Peron theorem |43] 



Perons's theorem states that a positive matrix W has an eigenvalue A, which is real, 
positive and non- degenerate and which exceeds in modulus all the other eigenvalues. To 
this dominant eigenvalue there corresponds an eigenvector with all its elements positive. 
In our example, the transpose of the positive matrix W is stochastic, i.e., J2i = 1) 
and hence the dominant eigenvalue is unity. 

Peron's theorem can be seen as follows. Consider the following left eigenvalue equation, 

(L| W = {L\ X , (114) 

where {L\ is the left eigenvector corresponding to the eigenvalue A. We can obtain the 
upper and lower bounds of the eigenvalue as follows. 

TV 
i=l 

N N 
Lmin'^Wij < XLj < Lmax'^Wij, (116) 

i=l i=l 

where Lmax = max{Lk} and Lmin = min{Lk}. Since Wij = 1, it follows, 

Lmin ^ XLj < Lmax ■ (H''^) 

Consider the space of all positive vectors {{L\; Lj > V j = 1, 2, • • • , N}. Then 

Lmin ^ \ ^ Lmax \_i ■ /-iio\ 

— — < A < — — Vj. (118) 
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The minimum value that L^ax/Lj can take as j runs from 1 to is unity; similarly, the 
maximum value that Lmin/Lj can take is unity. Thus we get 1 < A < 1, which imphes 
that A = 1. Thus if the left eigenvector is positive then the corresponding eigenvalue is 
unity. 

Consider now the space of vectors {(-^^|} such that each vector has some of its compo- 
nents positive and some negative. We note that L^ax > and Lmin < 0. The bounds can 
be easily worked out and are given by, 

max<^ 7 ' 7 

Thus we see that A = 1 is the largest eigenvalue and the corresponding left eigenvector 
is positive. It is easily verified that {U\ = c (1 1 1 • • • 1), the constant vector, is the 
left eigenvector corresponding to the eigenvalue A = 1. The constant c can be fixed by 
choosing a suitable norm. 

We can easily show that the eigenvalue A = 1 is non degenerate. To this end, assume 
the contrary. Let {V\ be another positive eigenvector corresponding to the eigenvalue 
A = 1. Then a linear combination of {U\ and {V\ given by {r]\ = a{U\ + P{V\ is also an 
eigenvector. We can choose a and P such that {r]\ has some components positive and some 
negative which contradicts the fact that the eigenvector is positive. This completes the 
proof. 

Let us now consider the right eigenvector \R), corresponding to A = 1. It is easily 
proved that \R) is positive. Consider the eigenvalue equation 

(VFi,i - l)Ri + VFi,2i?2 + • • • + Wi^nRn = 0. (120) 

Let us assume i?i is negative. The first term in Eq. ( |120D is positive since Wi^i < 1. 
Hence at least one of the other components of \R) must be negative. In other words one 
of the elements of the set {R2, R3, • • • Rn} must be negative to render the sum in Eq. 
( |12C| ) zero. Without loss of generality we take R2 to be negative. Consider the eigenvalue 
equation, 

W2,lRl + {W2,2 - l)R2 + W2,3i?3 + • • • + W2,nRn = 0. (121) 

Add Eq. dm) to Eq. ([T21I ) and get, 

N 

(H^i,i + W2,i - l)Ri + {Wi,2 + W2,2 - 1)R2 + 5](^i,i + W2,j)Rj = 0. (122) 

3=3 

In the above, {Wi^i + W2,i) < 1, Ri < 0, {Wi^2 + ^^2,2) < 1 and R2 < 0. Therefore we 
find that the first two terms in the above equation are positive. Hence at least one of the 
elements of the set {R3, Ra ■ ■ ■ Rn} must be negative to render the sum zero. Without 
loss of generality we take this as R3. Arguing along the same lines we show that if Ri is 
negative then all the other components of the vector \R) are also negative. This in other 
words means that the right eigenvector corresponding to the largest eigenvalue A = 1 is 
positive. We call \R) the equilibrium eigenvector. 

Numerically the equilibrium ensemble for the three-state model is constructed as fol- 
lows. Start with an arbitrary state xq £ ri; select a trial state xt £ with equal probability. 



< A < 1. 



:ii9) 
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Note that we have taken W^j = 1/3 V Calculate w = f{xt)/f{xo). Accept the trial 
state as the next state with a probability given by minimum of If accepted set 

xi = xt, if rejected set xi = xq; and proceed. Thus we get an ensemble of states. 

Peron's theorem is special case of a more general theorem on the eigenvectors and 
eigenvalues of non-negative matrices proved later by Frobenius. A proof of the Probenius 
theorem can be found in the book on matrices by Gantmacher [^]. As per this theorem, 
asymptotic convergence to equilibrium vector is possible even under weaker condition. 
Some or several elements of the transition matrix W can be zero. In other words Wi,j > 
Vi,j and of course J2i^i,j = 1- such cases it is adequate that {W"^)-j > V i,j 
and for an m < oo; m may depend on i and j. Physically this means that any state is 
accessible from any other state in finite number of steps. 

This weaker condition helps us make a better choice of the trial matrix W*. Some or 
several elements of this matrix can be zero. In practical terms it means that the trial state 
Xt can be selected randomly from a small neighbourhood of the current state Xj. Thus we 
set Xt = Xi + rji where r/j is a random integer from — e to +e. In this approach, it may so 
happen that the trial value lies outside the range of x. In such a situation, keep generating 
new trial values until, you get one within the range. The choice of e is crucial. If e is too 
large, then the fraction of the accepted trials would be too small and the sampling poor 
and inefficient. On the other hand if e is too small, then even though a large number of 
trials would get accepted, the value of x would remain close to the starting point over 
several trials and hence not span the entire range of x quickly and efficiently. A good 
criterion is to fix e such that the acceptance is between 30% and 50%. 



Assignment 20 

Employ Metropolis sampling technique to sample from a) binomial distri- 
bution and b) Poisson distribution. 



In the above we have discussed the Metropolis technique with respect to discrete distri- 
butions. The method can be readily extended to sample from continuous distribution. 



Fig. |lj depicts the results of sampling from an exponential density employing Metropo- 
lis sampling. 

A very large number of transformations, tricks, and formulae have been discovered for 
sampling from a variety of non-uniform distributions. I shall not discuss them here. These 
techniques are scattered over several publications on Monte Carlo and its applications to 
different disciplines. 
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Figure 14: Sampling from an exponential density employing Metropolis algorithm, e = 3 gives 
an acceptance of 30% — 50%. The solid curve is the exact exponential distribution. The histogram 
represents the frequency distribution of a sample of size 10, 000. 



Assignment 21 

Employ Metropolis technique to sample from the distribution f{6) = 
exp [acos{6)], for < < tt, where a is a parameter. The optimal value of 
e that would give 30% to 50% rejection would depend on the value of a. 
Compare your results with the exact. 



11 ANALOGUE MONTE CARLO 

Consider evaluation of the expectation value (also known as the mean) of a function h of 
the random variable X. Formally we have, 

/+00 
h{x)f{x)dx, (123) 
-oo 

where f{x) is the probability density function of the random variable X. h{x) is usually 
called the score function. How do we estimate /i ? Let us first consider analogue simu- 
lation. A Monte Carlo simulation is called analogue simulation, if it does not employ any 
variance reduction devices. The simulations that employ variance reduction techniques we 
shall term as biased simulation. In analogue Monte Carlo, we sample randomly a sequence 
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of {xi : i = 1,2, ■ ■ ■ , N}, from the density f{x) and write, 



hN 



1 

N 



N 

^h{xi 
1=1 



(124) 



In the hmit N ^ oo, h]\[ ^ fi. Also by the Central Limit Theorem, in the limit N ^ oo, 
the probability density of hj\[ tends to a Gaussian with mean fi and variance cr'^/N, where 



[h{x) — /i] f{x)dx. 



(125) 



Thus we say that Analogue Monte Carlo estimate of is given by Hn ± ct/v A^, where 
±a/^/N defines the one-sigma confidence interval. This means that with a probability p 
given by. 



p 



exp 



N{x - n)' 
2^ 



dx 







£'exp 





dx 



0.68268, 



(126) 



we expect to lie within ziza/VN around /i, if is sufficiently large. 

First we note that we do not know a. Hence we approximate it by its Monte Carlo 
estimate Sjy, given by 
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1 



TV 



N 



[ Xi 



i=l 



1 



N 



1 2 



Xi 



i=l 



(127) 



The quantity ±Sn/VN is called the statistical error. Notice that the sample size N 
must be large for the above estimate of the error (and of course the mean) to hold well. 
Sometimes it would worth the effort to test if the sample mean has acquired a normal 



distribution, see for example the test devised by Shapiro and Wilks |46]. Normality tests 
are useful in a biased Monte Carlo simulation. 

The statistical error decreases as inverse of the square root of the sample size. This is 
rather slow - logarithmically slow. For example if we want to decrease the statistical error 
by a factor of two we must increase the sample size by a factor of four. The computing 
time increases linearly with the sample size. Hence invariably one finds that analogue 
Monte Carlo is practically impossible. But there is a way out. 

The way out is to resort to techniques that reduce the variance without altering the 
mean. These are called variance reduction techniques and in what follows I shall 
describe the basic principle behind variance reduction techniques through what is called 
the importance sampling. 
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12 IMPORTANCE SAMPLING 



Importance sampling helps us sample from the important regions of the sample space. 
Consider the problem described in the last section. We sample {xi : i = 1,2, ■ ■ ■ , N} from 
an importance density ^'(a;) instead of the analogue density f(x). To preserve the mean 
we define a modified score function H{x) given by, 

H(x) = hix)^. (128) 

The expectation value of H is evaluated over the importance density g{x); this is identically 
equal to the expectation value of the original score function h over the analogue density 

fix): 

/+00 
H{x)g{x)dx 
-oo 

h{x)——g{x)dx 
oo g[x) 

h{x)f{x)dx 

-oo 

= ii{h). (129) 

Thus we sample {xi : i = 1,2, - ■ ■ ,N} from g{x), and calculate the ratios {f{xi)/g{xi)}. 
The biased Monte Carlo estimate of /i is given by 

fi^^^EMx.jff. (130) 

In the limit N ^ oo, Hn — yU- Let us now calculate the statistical error associated with 
Hi\i. It is adequate if we consider the second moment, since we have formally shown that 
the mean is preserved under importance sampling. We have 

/-|-oo 
H^{x)g{x)dx 
-oo 



-oo 

^ /■+0O h{x)f{x)h{x)nx) 

J-oo g{x) g{x) 
+00 r/(^) 



- L 

For the analogue simulation, 



9{x) 



h\x)f{x)dx. (131) 



/' + 00 
h^{x)f{x)dx. (132) 
-oo 



Thus if we choose properly the importance function g{x) and ensure that the ratio f{x)/g{x) 
on the average is substantially less than unity, then wc can make M^(i7) to be much less 
thdJiM^{h). This in essence is the basic principle of variance reduction techniques. Thus, 
sampling from an importance density helps us estimate the mean with a much better statis- 
tical accuracy for a given sample size. In fact it is due to the variance reduction techniques 
that Monte Carlo simulation of many problems have become possible. 
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12.1 Exponential Biasing 

Let me illustrate the use of importance function on a simple problem where all the relevant 
quantities can be calculated analytically. The problem has its origin in radiation transport 
through thick shields. The actual problem of Monte Carlo simulation of the transport is 
relatively complex and a detailed discussion of this is not relevant for the purpose here. 
Hence, I shall be brief. Consider particles, neutrons or gammas, incident normally on the 
left face of a slab of material of thickness T. The particle would penetrate a distance x into 
the shield with a probability S exp(— Sx) and enter into a collision event at a point between 
X and x + dx with probability T,dx. Here is the mean free path (mfp). We can measure 
all distances in units of mfp and hence set S = 1. The collision can lead to absorption 
in which case the particle history is terminated. On the other hand if the collision is a 
scattering event, the history continues. In a simple-minded picture, the scattering is taken 
as isotropic. In one-dimensional model, this means that the particle is directed toward 
the right or the left face of the slab, with equal probability. The particle travels along 
the scattered direction and has a collision event at some distance. The history continues 
and is constituted by a series of alternating free flights and collisions. The history ends 
when the particle is absorbed inside the shield or escapes the shield. When the particle 
escapes through the right face it scores unity; otherwise the score is zero. The scores from 
a large number of particle histories are accumulated. The average of these scores is the 
Monte Carlo estimate of the mean transmission defined as the fraction of the incident 
particles that escape through right face of the shield. The sample fluctuations give an 
estimate of the statistical error. If the thickness of the shield exceeds 20 mfp or so, the 
problem becomes intractable by analogue Monte Carlo. Variance reduction techniques 
become imperative. 

An importance sampling technique often used in this context is called the exponential 
biasing. In the simplest version of exponential biasing, the inter collision distance is 
sampled from the importance density bexp{—bx), where the biasing parameter b has to 
be optimized for minimum variance. For more details on exponential biasing see |47]. 



In the simple problem, we shall be interested in calculating the fraction of the incident 
particles that escape through the right face of the shield, without undergoing any collision 
whatsoever. This amounts to the following: In analogue simulation, sample a random 
number from the exponential density. If this exceeds T, score unity. Otherwise score 
zero. Collect the scores from histories and calculate the mean and the statistical error. 
In the biased simulation we sample x from the importance density function 6exp(— 
where we have assumed that we know the value of 6 = 6, for which the variance is 
minimum. Let us denote this sampled value of x as Xj. If Xi exceeds T, then score 
Wi = f{xi)/g{xi,b) = exp[— — b)]/b. Otherwise score zero. Accumulate the scores 
from a large number of histories and calculate the mean and the statistical error. 

Fig. [l5| a depicts the probability density function f{x) and the score function h{x) for 
the case with T = 5 mfp. Fig. [l5| b depicts the importance density function g{x,b = b) 
and the modified score function H{x) = f{x)h(x)/g{x,b) for the case with T = 5 mfp. 

For this problem, all the quantities can be calculated analytically. The expressions for 
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Figure 15: (a) The probability density f{x) and the score function h{x) for T = 5 mfp; (b) The 
importance density function g{x) and the modified score function H{x) = h{x)f{x)/g{x). b = 0.18 

the average and variance are, 

fi = exp(— x) dx = exp(— r), (133) 

Jo 

= (134) 

Table (6) gives the mean, relative standard deviation and the number of histories that 
would be required to estimate the mean in an analogue simulation, within ±10% of the 
exact. 

Table 6: Analogue simulation: exact analytical results 



T 








3 


4.98 X 10-2 


4.37 


1.91 X 10^ 


5 


6.74 X lO"-"^ 


1.21 X 10^ 


1.47 X 10^ 


10 


4.54 X 10"^"^ 


1.48 X 10^ 


2.20 X 10^ 


20 


2.06 X 10-^ 


2.20 X lO'' 


4.85 X 10^° 



We see from Table (6) that as the thickness increases the mean transmission decreases 
and the relative fluctuation increases. We find that the number of histories required to 
predict the mean within ±10% statistical error is over 48 billion for T = 20 mfp, a task 
which is clearly impossible on any computer. 
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Let us see how the use of importance samphng renders possible the impossible. Under 
importance sampling, the variance can be calculated analytically and is given by, 



It is quite straight forward to calculate the value of b for which the variance is minimum. 
We get, 

6=1 + ^-^1 + ^. (136) 

Table (7) presents T,b,a/iJ, and the number of histories required to estimate the mean, 
under biased simulation within ±10% of the exact. 

Table 7: Biased Simulation: exact analytical results 



T 


b 


a/fi 


N 


3 


.28 


1.95 


381 


5 


.18 


2.55 


651 


10 


.09 


3.65 


1329 


20 


.048 


5.18 


2687 



We find from Table (7) that the use of importance sampling would lead to a consid- 
erable reduction of variance especially for large T. Take the case with T = 5 mfp. The 
use of importance sampling would reduce the statistical error by a factor five or so. As 
a consequence a sample of size 650 is adequate to estimate the mean whereas analogue 
simulation would require a sample of size 14, 700. The results for T = 20 mfp are more 
dramatic and bring home the need and power of importance sampling. The statistical 
error gets reduced by a factor of 4247. We need to simulate only 2687 histories to estimate 
the mean within ±10% of the exact, as compared to 48 billion histories required under 
analogue simulation. 

We have simulated explicitly 10000 histories under importance sampling, as follows. 
We sample Xi from the importance density, 6exp(— 6x), and calculate the mean of the 
modified score function as, 

1 ^ 

Hn = j^Y.H{xi), (137) 
1=1 
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where, 



■ exp 



l-b]x 



0, 



, if Xi > T, 
if Xj < T. 



(138) 



The statistical error is calculated as, 



N 



N 



J2HHx,)-Hj,. 



(139) 



Table (8) gives the estimated mean transmission Hf^f, the relative statistical error, and the 
actual deviation of the Monte Carlo estimate from the exact transmission. 



Table 8: Results of Monte Carlo simulation of 10, 000 histories with importance sampling and 
comparison with exact analytical calculations 



T 




'^i^^ X 100 


^^-f" X 100 


3 


4.94 X 10" 


-2 


±2.0% 


-0.8% 


5 


6.76 X 10" 


-3 


±2.6% 


±0.3% 


10 


4.52 X 10" 


-5 


±3.7% 


-0.4% 


20 


1.96 X 10" 


-9 


±5.4% 


-4.9% 



We observe from Table (8) that we are able to make a good estimate of the mean trans- 
mission employing importance sampling. The corresponding results obtained by analogue 
simulation of 50000 histories (five times more than what we have considered for analogue 
simulation) are given in Table (9). 

We find from Table (9) that analogue simulation of the problem with T = 20 mfp is 
impossible. On the average, we can expect one in 49 million numbers sampled from the 
exponential density, to have a value greater than 20. The chance of getting a score in a 
simulation of 50000 histories is practically nil. 

12.2 Spanier Technique 

In several situations, like the one discussed in section 12.1, it is possible to make a good 
guess of the shape of the importance function g{x, b), where b is an unknown parameter to 
be optimized for minimum variance. Spanier has proposed a technique for optimizing 
the parameter b, through processing of the Monte Carlo results from relatively small 
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Table 9: Results of analogue simulation of 50, 000 histories and comparison with exact analytical 
results 



T 




Aftjv X 100 


''^-'^ X 100 


3 


5.03 X 10-2 


±1.9% 


±1.8% 


5 


6.68 X 10"^ 


±5.5% 


-0.9% 


10 


6.16 X 10"^ 


±57.7% 


±35.7% 


20 









samples of histories. The technique consists of expressing the second moment, for the 
biased problem, see Eq. (|131|) as, 

,^ /, N f h{x)f{x) h(x)f(x) , r\ , /-, 
J g{x,bi) g{x,b) 

where b is the guess value of the parameter b and {6j : i = 1, 2, • • • , M} are the prescribed 
values of b spanning its full range or a conveniently chosen interval in its range. 

The implementation of Spanier's technique proceeds as follows. Start with a guess 
value b. Generate a set of A*" values of x by random sampling from the importance density 
g{x,b). Calculate, 

for , = 1,2,...,M. (141) 

Nf^^ g{Xi,bj) g{xi,b) 

Thus we get M2 at M distinct values of b. Find the value of b at which the second moment 
is minimum; use this as b in the next iteration and generate a second set of histories. 
Repeat the procedure until b converges. N can be small for the purpose of optimizing b. 

Fig. |16|A presents the results for the problem with T = 20 mfp at three stages of 
iteration starting with an initial guess of 6 = 0.1. The number of histories generated at 
each stage of iteration is 500. Fig. |l^B depicts the second moment as a function of b 
generated with the converged value of 6 = .049, along with the exact analytical results. 

Spanier's technique is quite general and powerful. Application of this technique to radi- 



ation transport problem can be found in |48, Several modifications and improvements 



to this technique have been recommended and those interested can refer to |50|. 

13 CLOSING REMARKS 

In this monograph I have made an attempt to describe the fundamental aspects of Monte 
Carlo theory in very general terms without reference to any particular topics like neutron 
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Figure 16: (A) '- - ' denotes the results generated by 6 = 0.1; the minimum of the curve occurs 
at 6 = 0.048; ' — ' represents the results generated with b = 0.048 and the minimmn occurs at 
b = 0.049. '- ' denotes the results generated with b = 0.049 and the minimum is found to be 
at 6 = 0.049. 500 histories were generated at each stage of iteration. (B) '-' depicts the second 
moment generated by the converged value b = 0.049 and '— •' depicts the exact analytical results. 



transport or problems in statistical mechanics. 

An understanding of the fundamental ideas of probability theory: sample space, events, 
probability of events, random variables, probability distribution, moments, cumulants, 
characteristic function, etc, is a must for appreciating the basis of Monte Carlo techniques. 
I have given a brief introduction to these topics. 

The Central Limit Theorem and its use in estimating Monte Carlo error bars is an 
important topic and has been dealt with in detail, preceded by a brief discussion on the 
Chebyshev inequality and the law of large numbers. I have also very briefly touched upon 
the generalization of the Central Limit Theorem, namely the Levy stable law. 

A long sequence of pseudo random numbers constitutes the backbone of any Monte 
Carlo simulation. Often one finds that many Monte Carlo practitioners tend to remain 
ignorant of how these numbers are produced. They often consider the random number 
generator routine in their personal computer, the workstation or the super computer, as 
a black box. This tendency must go. The reason is simple. There is no random number 
generator which is flawless and which would yield random numbers useful for all kinds of 
simulations. Take for example the most popular and widely used of the random number 
generators. It is based on linear congruential recursion; it suffers from a very serious 
defect: the lattice and Marsaglia hyper plane structures that ensue when you embed the 
random numbers in two and higher dimensional phase space. If your problem is such that 
the Marsaglia lattice structure would influence your results significantly then you should 
seriously consider alternate means of generating pseudo random numbers. I have given a 
brief outline of these and related issues in this monograph. 

I have also discussed some tests of randomness of the sequence of pseudo random 
numbers. Of course we understand that there is absolutely no guarantee that the sequence 
of pseudo random numbers given by the generator in your machine is adequately random 
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for the particular Monte Carlo application you have on hand; there can at best be only 
safeguards: carry out certain tests of randomness, like the uniformity test, correlation tests, 
run test, the tests based on embedding the sequence in phase space through construction of 
delay vectors or any other sophisticated tests you may think of. In fact the Monte Carlo 
algorithm you have developed itself can be used to test the randomness of the pseudo 
random number sequence. 

Random sampling from non-uniform distributions constitutes the core of Monte Carlo 
simulations. I have dealt with inversion, rejection, and Metropolis techniques for sampling 
from a given distribution. The last of these, namely the Metropolis algorithm, is usually 
employed in simulations of problems in statistical mechanics. 

Almost all Monte Carlo simulations employ, in some form or the other, techniques 
of variance reduction. I have tried to convey the basic principles of variance reduction 
through a discussion of the importance sampling device. I have illustrated the use of 
importance function by considering exponential biasing on a simple problem. I have also 
dealt briefly with the Spanier technique for optimizing the biasing parameter in importance 
sampling. 

Let me end by saying that there are several excellent books and review articles on 
Monte Carlo theory and practice, which you would like to consult during your 'Monte 



Carlo learning' . Some of these books that have caught my attention are listed under |51]. 
Let me wish you all a merry time with Monte Carlo games. 
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